Blog Scraping – a trend that hits photogs

by planetmitch5 Comments

UPDATE 12pm CDT:

I've had several nice emails back and forth with the owner of seriouslyphotography (turns out he's also a fan of planet5D) and he's been contacted by several of you about this issue. He's changed his format to more closely match what I was asking for in terms of a short excerpt of the articles and then a link to our blogs. He's also posted a ‘code of ethics‘ to let everyone know more about his intent and what the site's goals are.

He also corrected me that he doesn't do ‘scraping' (“Scraping is a process that involves reading a sites html and parsing out the content whilst ignoring any rules in a robots.txt file”) – he reads the published RSS and posts that content. He said: “It may seem like a subtle distinction (crawling and scraping) but it is really very significant.” and i'll agree.

I have also suggested to him that he contact the blog owners up front, so there's no surprise and shock and anger in the discovery. I think that would be a better way to get content on his behalf.

See, we're all learning something today!

planetMitch

Original post:

planet5D friend 1001noiseycameras alerted us last night that a new site is taking planet5D's content and re-hosting it without our permission. I've learned a lot in the last few hours and will probably update this post as I learn more. Turns out of course, this has a name, “blog scraping” and is becoming more common.

But first, please check this list to see if your blog is on their site:

seriously

Then, you can send an email to their editor demanding that they stop hosting your content.

Now, why does this upset me? Mainly because it is wrong. Granted, they do provide links back to planet5D, but they're allowing the reader to never come to my site – duplicate pages to mine messes up the search engines, hurts my advertisers etc.

A few months ago, I was thrilled that photography.alltop.com had added us to their really nice site and we still are. The difference tho is that alltop is doing it right. They just post the headlines and a small snippet of the top of the article and if the reader wants to read more, then they click and visit our site. Our content remains intact and we get the traffic. That's ok with me.

As a result, I may need to change my RSS feeds so that the entire post isn't sent to those using RSS readers. I know that won't be popular with some of you, but if it protects our content then that's what we'll need to do.

I'll probably update this later… just wanted to get it out there for others.

Comments

  1. David

    For specialized blogs like yours, I think full post RSS is better for the blog’s growth in the long run than partial post RSS. In terms of actual numbers, you only alienate a minority of your readers when switching to partial-post RSS, but they are a vocal minority, the kind of influential techy types who their peers turn to for advice about good blogs to read, etc.

  2. John

    I agree it is wrong, but if you let people read all articles offline in RSS readers, why not through another site?

    I read the articles solely through RSS, and would *not* complain having only a short description come up. This is how other sites (like DIYPhotography, for example) do their RSS. I have no problems linking through to the website.

    Keep up the good content!

    John

    1. Author
      planetMitch

      Thanks for the input David and John.

      John, my biggest concern is the search engines – if there’s duplication of the same content, then maybe the other site will get higher rankings and I want those rankings for planet5D. I’ve worked really hard getting my traffic and I want to keep it!

      The offline RSS readers are for the convenience of my readers, not for someone else to get traffic on their site. I don’t want to force our readers to come to the site for something they can get quickly in RSS – but at the same time, I do a few things differently to try to get those readers to see my sponsors and the other site features that they miss in RSS.

  3. John

    It is possible to force authentication for RSS feeds – maybe have users create an account before grabbing a full-feed? With a snippet feed for non-authenticated users? Most RSS readers let you specify credentials to use so you’de never have to enter them manually..

  4. AlainP

    Hi Mitch,

    seems like my site is on the list too… Now that he is only posting partial post, it is fine with me but I have to agree that full content post used without asking is a big no no!

    Creating original content takes more time than most people can imagine. So seeing others do a cut&paste can be quite frustrating.

    On the other hand Mitch, I have read that Google can figure out when people scrape sites so it is not as bad as it used to be.

    Still, me must protect our copyrights.

Leave a Comment