Demystifying Duplicate Content: What to Know for Google Best Practices ?
Duplicate content is one area with an incredible amount of myths, so much so that many people are more afraid of the “duplicate content penalty” than they are of recluse spiders . Duplicate content is also on the rise as the standard for quality content keeps getting drastically higher and higher by the day. It's now even harder to create content that truly stands out, hence, the reason most marketers are duplicating content more.
In this Google-focused, in-depth dive, we'll discuss everything relating to duplicate content and more, with detailed best practices based on Google’s recommendations. Let's start with the big question (and what is perhaps the biggest myth surrounding duplicate content):
Does Google penalize sites for duplicate content?
It may come to you as a surprise, but the answer is NO.
In short, there's no such thing as Google’s “duplicate content penalty.”
The whole fuss about how that Google goes after sites with duplicate content is nothing but a huge SEO myth.
In a video, Google’s Search Quality Senior Strategist, Andrey Lipattsev, made it clear that Google does NOT have a duplicate content penalty. Also, Google recently updated its Search Quality Evaluator Guidelines and never said it has penalty for duplicate content.
So does that give you the go-ahead to publish duplicate content on your site?
The answer is simply no. While Google doesn’t penalize sites for duplicate content, it doesn't encourage publishing the same. Why? Because of what might be termed “duplicate content issues.” Duplicate content issues affect both you and Google search bots.
On the side of the bots, it becomes difficult to determine which content is original, which to rank higher and which to drop. But in most cases, that isn't always a case for the bots.
On your side, the story is different. Duplicate content *issues* does not lead to penalty from Google per se, but it may impact your SEO negatively.
Here's an explanation:
If Google finds a duplicate content on your site, it mostly won't index/rank that page. But if it does, it mostly won't rank the duplicate page above the page that has the original content. And that's not a penalty. It's a longstanding way Google has been operating to favor its users. The search giant is only keen on providing its users with the best experience possible and they do this by displaying original, unique, relevant and rich results that fittingly answer user questions.
As a result, it displays mainly the content it finds to be of the greatest help to the USER.
Don't misunderstand this:
Since Google’s main objective is to please the user, it might sometimes display your duplicate content IF it finds that that is the MOST RELEVANT result for the user, based on the user query. But that's no green light for you to deliberately duplicate content. The thing is, Google’s engines are sophisticated enough to know ‘Deliberately Duplicated Content’ and once it does, it'll mark such content as manipulative or even malicious.
That's when trouble starts!
So if you're trying to rank in competitive niches, you need original content that’s not found on other pages, whether on your site or on other websites. (The latter is more severe, though.) Otherwise, your pages will struggle to find traction in Google’s SERPs. Also, if the competing site (the site where same content is published) has higher domain authority or the other page has more backlinks, then that could affect the ranking, too.
What is duplicate content?
Google has given its own definition of duplicate content: “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin…” Google
So that definition from Google is pretty concentrated and easy to understand. To expound, it could be anything from a product description on your ecommerce site you borrowed from the original seller, to a quote you copied from an authority blog in your niche, to a simple boilerplate text on both your landing page and homepage. Can I get rid of ALL instances of duplicate content on my site?
The reality is that no matter how hard you try, you really cannot offer 100% unique content. Put differently, you can’t remove all instances of duplicate content on your web pages, even when you use the recommended rel canonical tag url parameter. In a study by Raven Tools, conducted based on their site auditor tool, duplicate content was found to be among the top 5 SEO issues plaguing sites.
Google’s Matt Cutts has himself stated that duplicate content makes up about 25 to 30 percent of the web. The study by Raven Tools also confirmed this claim with a reported 29 percent of pages being duplicate content. That's a whole lot of duplicate content, you know.
However, for SEO, it is not necessarily the abundance of duplicate content on a website that is the big issue. It’s the LACK of UNIQUE CONTENT which provides value to users. So we can say that Google does not focus on duplicate content knowing that the web is full of it. Instead, when it wants to rank your site, it determines the ratio between unique and duplicated content on your site.
If the ratio of duplicate content is way higher than that of unique content, the quality of your SERPs appearances could be low, and vice versa.
Can duplicate content rank in Google?
Of course, it can (we just said that!). But as stated earlier, this is not a recommended strategy for getting higher rankings as more trusted pages with same content can easily outrank yours. Hence, it shouldn't be used as a long-term SEO strategy. Again, too many low-quality pages might cause you sitewide issues in the future, not just page level issues.So for better rankings, stick to original content that's unique and possibly exclusive to only one page on your site.
Should I block Google from indexing my duplicate content?
When webmasters try to think up a solution for their duplicate content issues, one of the first ideas that usually come to mind is to block Google from indexing the duplicate page. But that's not an approach recommended by Google. BEFORE now, Google guidelines on this issue was this:
“Consider blocking pages from indexing: Rather than letting Google’s algorithms determine the “best” version of a document, you may wish to help guide us to your preferred version. For instance, if you don’t want us to index the printer versions of your site’s articles, disallow those directories or make use of regular expressions in your robots.txt file.” Google
But they had since changed that, as reflected on the new guidelines. Correspondingly, in a NEW Google Forum post, Google has this to say:“Google does not recommend blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages…” Google
So essentially, there is no need to block your duplicate content from getting picked by Googlebot. That's simply the recommendation from the search god itself.
What constitutes duplicate content?
Let's look at this in two instances:
Duplicate content on the same domain
Duplicate content on different domains
Duplicate content on the same domain
The first instance is duplicate content found within your site, which means the same content appears in different web pages on your site. It could be that the content is present on your site in different locations (URLs), or it’s accessible through different ways thus resulting in different URL parameters.
For example, if your posts can be accessed via your blog page (yoursite.com/blog/post-title-1)...
…and also via other locations like yoursite.com/category/post-title-1...
...when search bots crawl your site, they will think that this piece of content appears multiple times on your site, and so will treat it as duplicate content. The good news is that this type of duplicate content doesn’t harm your SEO. Search engine bots are sophisticated enough to understand that the intention behind this content duplication is not malicious.
Common examples of duplicate content on the same site are:
Boilerplate content: An example of boilerplate content is the nav menu which usually appear on multiple pages (home, about us, etc.). Google crawler crawls those as duplicate. But like stated earlier, this type of duplicate content hardly affects your SEO.
Content duplicated as a result of inconsistent URL structures: When search engine bots come across the same content on two different URLs (http://yoursite.com and https://yoursite.com), they simply think of it as duplicate content.
Localized domains: If your business serves multiple countries and you create localized domains for each country, then naturally the content will overlap on the different sites except you translate to the local language (more on that later).
Duplicate content on different domains
Duplicate content on different domains are a lot riskier than the first in terms of SEO. So we're going to have to look at each one a little more elaborately. Let's look at the three most popular ones:
“In simple terms, the process of content curation is the act of sorting through large amounts of content on the web and presenting the best posts in a meaningful and organized way. The process can include sifting, sorting, arranging, and placing found content into specific themes, and then publishing that information” Hootsuite
Being that content curation posts compile a list of content pieces from around the web, it is natural that the posts contains duplicate content. Even if your blog post only borrows a headline or quote from another post, that would pass for curated content.
As long as you provide some value, a fresh perspective, or explain things in your own style, Google will not view this content duplication as malicious.
Content syndication is the process of republishing your content on third-party sites. The purpose is usually to reach a broader audience. It could be your blog post, infographic, video, etc., getting republished either as a full article, snippet, link, or thumbnail.
So typically, this means that several copies exist of any syndicated content, which is duplicate content. Big sites syndicate content all the time. If you’re familiar with the New York Times for example, you’ll know that it allows content syndication. Every day it features stories from all over the web and republishes them with permission. Buffer also syndicates content. Their content gets republished on sites like the Fast Company, HuffPo, Business Insider, and more.
While these instances are counted as duplicate content, Google doesn’t penalize them.
While Google does not penalize the above duplicate content types, it does treat copied content differently, the difference here being the intent and nature of the duplicated text. Duplicate content is often not manipulative and is popular among webmasters, but if you copy content with malicious intent, Google will frown at it. Simply put, copied content can be penalized algorithmically or manually.
What does not count as duplicate content?
Mobile site content: If you run a mobile version for your site and its content is the same as the one on the web version, then that does not count as duplicate content. Also, it's good to know that the search bot Google uses in crawling mobile sites is different from the one used in crawling web-based sites.
Translated content: If you have a site that's localized for different countries along with your main site (ex. yoursite.co.uk + yoursite.com) and TRANSLATED your main content into the local languages, that won't pass for duplicate content.
Solutions to duplicate content
From URL parameters to canonical tags and session IDs, there are some steps you can take to proactively address duplicate content issues, and ensure that visitors see the content you want them to. But each solution depends on the particular situation. Below are some of Google’s recommended steps towards facing off duplicate content issues:
Be consistent with your internal linking: For example, don't link to http://www.yoursite.com/page/ and http://www.yoursite.com/page and http://www.yoursite.com/page/index.htm.
Be cautious with syndication: If you syndicate your content on third-party sites, Google will always show the version it thinks is most appropriate for users in each given query, which may or mayn’t be the version you'd prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. Additionally, you can ask those who publish your syndicated material to prevent Google from indexing their version of the content by using the noindex meta tag.
Reduce boilerplate repetition: For instance, instead of having lengthy “About” information sit on the sidebar of every page, why not include a very brief summary and then link to a page with more details? In addition, you can use Google’s Parameter Handling tool to specify how you would like Google to treat URL parameters.
Use 301s: If your site has been restructured for any reason, use 301 redirects ("RedirectPermanent") in your .htaccess file to smartly redirect users, Googlebot, and other spiders.
Control how your content management system displays content: For example, a blog entry may appear on the homepage of a blog, in an archive page, and in a category page, amongst other same level pages. If possible, minimize duplicate content on unnecessary pages and make it appear in only one.
Expand or consolidate similar content: If you have multiple similar pages on one domain, you can decide to either expand each page or consolidate the pages into one. For instance, if you have a food blog with separate pages for two recipes but the same information on both pages, you could either merge the pages into one larger page about both recipes or you could expand each page to offer unique content about each recipe.
Use canonical tags (aka "rel canonical"): If you have multiple pages that contain the same content, you can use canonical tags to tell Google that a specific URL represents the master copy of a page. That is, that certain similar URLs are actually one and the same
Rel=”alternate”: If your site runs alternate versions such as mobile or various country/language pages, you can use rel=”alternate” to consolidate them. With country/language in particular, hreflang is used to show the correct country/language page in the search results, not necessarily for higher ranking.
Wrapping it up…
So in this post, we've practically bursted the biggest duplicate content myth there is — we've seen that Google does NOT punish sites for duplicate content.
We've also provided you with some Google best practices you can use to un-maliciously get around duplicate content without hurting your SEO in any way. However, there's such a thing as “duplicate content issues” which if found on your site, could affect your rankings.
What to do?
Use the Google recommended steps we've provided in this post to work your way out. Nonetheless, like the 17th century proverbial saying goes: prevention is better than cure. So you'd be better off staying away from duplicate content practices. Google’s guidelines for quality are clear on this subject and simply adhering to those guidelines will help.
Want to know if your site contains duplicate content or if your content is being duplicated somewhere on the Internet? Check out our array of tools for all things SEO, including duplicate content tools. It's FREE!