How to remove duplicate content
What is duplicate content?
"Duplicate content" is any content within a particular article that is substantially similar to the content contained within another article. This can include sentence structure, words, and phrasing. It can also extend to duplicated images. While there are cases in which duplicate content is acceptable (printer-only pages, for example), what concerns most content writers is the deliberate duplication of articles on several sites. This can be done either by allowed syndication via feeds or by simply having one's article lifted in its entirety by a content thief.
Why is duplicate content bad?
There is no real value in having the same article posted in multiple places across the internet. If there are multiple copies of any article available, how can search engines such as Google decide which is the "best" copy? While most of us aren't privy to the subtle nuances of Google's algorithms, we know that SEO (Search-Engine Optimization) matters such as a site's Page Rank may come into play. This means that duplicated content may result in search traffic going to a copy of the article which is not of your choosing.
It is better to have one copy online with many valid backlinks to that one copy. In this way you can chose which copy of the article to show Bing, Yahoo, or Google, and you can decide which site will get the traffic to that particular article.
How to find duplicate content
It is easy to locate duplicate content if it exists for your article. Copy a snippet of your article into a plagiarism checker to see if repeated phrases can be found.
Feed aggregator sites will typically copy the first few sentences only, which means this is less of an issue than having another site steal your entire article. Be sure to do a line-by-line check the entire article in the plagiarism checker to be verify that you are in the clear before you move your content.
Popular duplicate content checkers
How to remove duplicate content
Once you have located the URL of the duplicate content, file a DMCA take-down notice against that URL to claim the article as being under your copyright. This will require:
- Locating the host of URL in question via a WHOIS search.
- Sending your take-down notice, which can be done by email if it includes a digital signature. Also include any appropriate screencaps needed to prove that the article is both yours and that it existed prior to the creation of the URL in question.
Find a plagiarist?
- Whois at Network Solutions : Get the offender's contact info to file a DMCA take-down notice.
Before moving your content
If you are moving your content from one site to another, the most important thing to do is to remove it from Google's cache. So long as it is cached, there is the potential for it to be flagged as duplicate content. Take a few minutes to remove it before re-posting the content elsewhere. Google has a special form which will allow you to submit your URL for removal.
Once submitted, it will show as "pending" in the Removal requests list until it is removed. This generally takes about 24 hours, which is much faster than waiting for Google to determine on its own that your content is moved. Once the status is updated to "removed", you are free to post your content elsewhere -- another content farm or your own website.
It is however advisable to double-check that your content has in fact been removed from all caches before moving it. This can be done by re-visiting your favourite duplicate content checker to search for your phrases. If it comes up clear, you should be safe.
Google search de-cache form
- Google Webmaster Tools : Remove your URL from Google's cache with this tool.