What is duplicate content and why should you care? Google’s Matt Cutts has said many times, there is no duplicate content “penalty” (you won’t be kicked out of the results), but it can substantially impact your rankings. When you have duplicate content search engines can’t determine which page should rank for a given phrase. Perhaps worse, if some people link using one address (URL), while others link using another address, neither page will have the credibility that it would if all the links pointed to a single address.
This has long been a problem on large database driven sites where only a minor difference in the address (URL) of the content (perhaps only a session ID) makes it appear that the same information is on more than one page. However this is also a common problem on Blogs where a Category listing, Tag listing, or Archive listing can present the exact same content, but with differing addresses.
This looks at some ways that duplicate content can occur.
1) True Duplicates – Any page that is 100% identical to another page other than the page’s address (URL).
This could result from that session ID id issue I mentioned:
2) Near Duplicates – Pages that differs only slightly, perhaps only an image or sidebar is different.
This could occur from differing sorts of the same information (category, tag, archive, …):
3) Cross-domain Duplicates -Occurs when two websites share the same piece of content.
This could occur from just shear theft (someone scraping your content) or from syndicating articles:
While the SEO crowd has been preaching the pitfalls of duplicate content for sometime, the farmer or “panda” updates that occurred throughout 2011 have pushed the importance of keeping duplicate content off your site this to the forefront. Peter Meyers covers the topic in more detail here, along with some ways to address these problems.