So Your Content Has Been Copied, Now What?

Yep, it was that time again. Time to do the random searches and see who has been lifting content off of some of my other sites and my client's sites.

With the proliferation of splogs and the ease of chewing up and spitting out RSS, it has become an almost monthly habit of mine to do a little searching for people who have been ripping off content on sites that I manage.

Here's how I find it, and what I do to deal with the issue.

Tools of the trade

Google of course works fairly well for finding duplicate content on the web, but the tool of choice for this task is Copyscape.

I like to go in and test a few random pages along with the money pages to see what I can find. Sadly I almost always find something. Happily, though, it can usually be resolved quite quickly, because the law is on your side - at least that has been my experience.

Techniques

I find that flagrant reproduction of RSS feeds is an easy one to handle. A simple note explaining that If the content is not removed in 48 hours we will be advising your hosts, your registrar and the major search engines of the infraction will usually get the ball rolling quite nicely.

Aside: I once called a guy after getting his whois info, and let me say that was very effective, though I don't recommend it as it is far easier to keep your cool in writing.

People tend not to put up a fight (often replying that their tech guy was responsible - sheesh), but if they do, fire them this link to help them get informed (though they likely know that they're on the dark side of the law). This part is rarely necessary, but it's a nice touch because they will take you very seriously if they understand that you know where you are coming from.

Plain theft of copy (as in not RSS republishing) can be a bit more difficult (who's copy is it?), but quite often, as Mike Davidson explains here, people who have been caught generally back down quite quickly. (Replace the word you in that comment with ISP and it is a decent description of how a DMCA complaint against a site works.)

In the end the thieves will usually back down - quite often the fear of losing their Adsense account is enough motivation.

Why fight the fight?

A lot of people think that this is a losing battle, and truth be told, it can be a difficult issue to keep tabs on.

The major issue for me is duplicate content in Google - I've had other sites ranking above mine where they are running my content - I'm not a big fan of that.

So for me it is worth keeping a lookout once in a while. This is especially easy with newer sites that get little traffic, but if you know your sites well enough, you'll can see dips in traffic to certain areas that you know should be higher. That can be a sign that it's time to do some research!

Some extra details

For those of you who want to find out more, Copyscape provides Responding to Plagarism, a Resource Center, and it you have a lot of time on your hands, the forums.

Comments and Feedback

Jens Meiert Mon, 20th of February, 2006

Well, theft can be fought, as you point out very nicely. But there are also legitimate copies (just remember site mirrors), so search engines must (should) be very careful about penalties, especially when they are automated.

Personally, I face the situation that I republish articles or interviews I do for online magazines or other people on my site (that's implicit if you want something written by/with me), and it's certainly not acceptable to be penalized for this since it's absolutely legitimate. Well, nothing happened yet, and that hopefully remains constant.

Mike P. Mon, 20th of February, 2006

Hey Jens,

Yeah, mirroring sites can be a difficult issue. Not a bad idea to keep the bots out of mirrors with the robots.txt file.

I remember that Keith had a problem with mirroring his site a while back, but this is all I can find on it now...

Jesse Skinner Mon, 20th of February, 2006

I have a problem with my site and newsisfree.com. newsisfree.com reproduces my RSS feed. As a result, my own site is ranked way lower on google! Seriously, search on google and newisfree is 3rd, my site is on the second page!

I asked newsisfree to remove my site, and they said they are an RSS aggregate like Bloglines and others. They haven't yet removed it. Since they are only displaying my RSS feed, I'm not sure what I can do against that... I'm worried that Google is penalizing my site, thinking I'm the one ripping off content!

Arjan Mon, 20th of February, 2006

Ah, thanks for the tips 'n links! I've been on the lookout for something like this for a while.

I guess the best solution to 'protect' your RSS-feed is to publish only excerpt and let readers go to your site for the full post...

*heads off to check check if his sites have copies*

Mike P. Mon, 20th of February, 2006

Have a read of this, Jesse. Pretty certain that they can't do that, I don't care what they are, just because you have rss doesn't mean that people can reproduce it.

In the meantime, you could block their bot. Check RSS user agent identifiers for the UA string, and try banning them with the robots text, or if you want to be sure, look for NIF/1.1 in the UA string and deal with it via htaccess or PHP etc.

Matthijs Mon, 20th of February, 2006

Very interesting article Mike. I did see copyscape a long time ago but had forgotten about it. After checking a few pages I directly found a copy of one of my pages somewhere. A pdf on some personal home directory, so it's probably meant as a personal backup. But duplicate nonetheless.

Thanks.

Jesse Skinner Mon, 20th of February, 2006

Thanks for the tip, Mike. Unfortunately, I'm using Feedburner, so I can't easily block them. Makes me seriously consider keeping my feed on my own server, though. Until then, I'll just keep writing nasty emails...

Mike P. Mon, 20th of February, 2006

Jesse, are you using the direct feedburner link or redirecting your feed transparently?

If it is #2, you can block it... (seems that you arent though)..

Jesse Skinner Mon, 20th of February, 2006

Sigh, I'm using the direct link. I think I'll change this around, though. It makes more sense to have a permanent URL on my server, especially if I can still gain the benefits of Feedburner.

So long, newsisfree!

Mark Mon, 20th of February, 2006

It would be funny if someone lifted this post and credited as their own.

Well, funny in an ironic way.

Mike P. Tue, 21st of February, 2006

Ha, that would be funny..

*mike makes a note to check this post in a week or so*

Steve Tue, 21st of February, 2006

You could do all that, or you could just produce some more content. You see creativity existed long before copyright, and the way that 'artistic' types managed to survive then was simple. They kept working.

The model these days seems to be:

1)Produce

2)Copyright

3)Collect until you die

The model then was:

1)Produce

2)Be copied

3)Produce More, and be even more liked because everyones heard of you from the copycats.

This is the fundamental flaw with copyright, and the current system of enforcing it. It breeds stagnation instead of advancement of the arts and sciences as it was intended too.

I digress though. It was a well thought out and useful article, I just hate the implications behind it.

My theory: Let people 'steal' your content, just make more. The copycats obviously can't so in the end you will win.

Mike P. Tue, 21st of February, 2006

Interesting point of view, Steve, and to be honest I do like that.

It is much easier to write new stuff and blog and be creative, but then there is the odd time you come across your content wrapped with Adsense on someone else's site, and if you've happened to have had a bad day...

Jesse Skinner Wed, 22nd of February, 2006

Newsisfree got back to me. They updated http://newsisfree.com/sources/info/28520/ so that users not logged in (Google included) only get the first sentence of each blog post. This should help out.

(As a little kick in the junk, they also updated my RSS feed to the new feedburner feed I made, which I had done just to evade them in the first place! Anyway, I asked them to point at the one on my server instead.)

Todd Carpenter Wed, 22nd of February, 2006

I fight this often. Most pull the feed right away, but if they don't, another way to get action is to post about it. "xxxxx.com" is ripping me off" seems to work. There aggregator automatically post on their own site. Then I send them the link to their own blog. It's good for a laugh anyway.

So Your Content Has Been Copied, Now What?

Tools of the trade

Techniques

Why fight the fight?

Some extra details

Comments and Feedback

Check out the blog categories for older content

The latest from my personal website,
Mike Papageorge.com

The Principles Of Successful Freelancing

The Art & Science Of JavaScript

The PHP Anthology: 101 Essential Tips, Tricks & Hacks

So Your Content Has Been Copied, Now What?

Tools of the trade

Techniques

Why fight the fight?

Some extra details

Comments and Feedback

Check out the blog categories for older content

The latest from my personal website, Mike Papageorge.com

The Principles Of Successful Freelancing

The Art & Science Of JavaScript

The PHP Anthology: 101 Essential Tips, Tricks & Hacks

The latest from my personal website,
Mike Papageorge.com