There is a new Canonical tag that is supported by Google, Yahoo, and Microsoft. This is important news, especially as people are working more and more with CMS systems and database driven websites (such as Joomla, WordPress, etc).
The issue is that people can arrive at your content in a variety of ways. The simplest variation being www. versus non www. version of your site. What’s that, you say they are the same? Au contraire mon fraire, to a search engine www is a separate sub-domain. There are some easy programmatic ways to address the www versus non www issue, however the problem starts to get sticky in when you mix in the wide range of variables such as dates, tags, and categories that can be included in the URL structure.
The search engine spider may find variations like these all pointing to the same content:
- www.mywebsite.com/january 2009/search-engine-optimization
Worst still are session ids (used on larger sites) which can cause a single page of content to be indexed hundreds of times:
Add to this that you can’t control how others link to your content and you begin to understand why duplicate content is such a large issue.
Why should you care? From and SEO perspective 1 page with 100 incoming links has far more relevancy than 100 pages with identical content having one incoming link each.
To address these concerns, Google, Microsoft and Yahoo have agreed to take the content of the Canonical Link Tag as the preferred name for a page.
So adding a tag such as:
<link rel=”canonical” href=”http://www.mywebsite.com/search-engine-optimization“/>
will indicate that the page should be indexed as www.mywebsite.com/search-engine-optimization, regardless of how the spider found the page.
Here’s a link to more information on this topic by Vanessa Fox at Search Engine Land:
Google’s Matt Cutts posted this video on the Canonical Link tag.
The discussion begins hitting of some of the finer points of this topic starting in at about 12:05.