Duplicate content has been an issue in search engine optimization for many years now, yet there is still a lot of confusion around what you can and can’t do with it, in terms of staying on Google’s good side. In fact, even in 2013, Googles’ head of webspam Matt Cutts has had to discuss the issue in several of his regular Webmaster Help videos because people keep asking questions and looking for clarification. Back in the summer, Cutts talked about duplicate content with regards to disclaimers and Terms and Conditions pages. “The answer is, I wouldn’t stress about this unless the content that you have is duplicated as spammy or keyword stuffing or something like that, you know, then we might be – an algorithm or a person might take action on – but if it’s legal boiler plate that’s sort of required to be there, we might, at most, might not want to count that, but it’s probably not going to cause you a big issue,”
Cutts said at the time. “We do understand that lots of different places across the web do need to have various disclaimers, legal information, terms and conditions, that sort of stuff, and so it’s the sort of thing where if we were to not rank that stuff well, then that would probably hurt our overall search quality, so I wouldn’t stress about it,” he said. The subject of duplicate content came up again in September, when Cutts took on a question about e-commerce sites that sell products with “ingredients lists” exactly like other sites selling the same product.
Cutts said, “Let’s consider an ingredients list, which is like food, and you’re listing the ingredients in that food and ingredients like, okay, it’s a product that a lot of affiliates have an affiliate feed for, and you’re just going to display that. If you’re listing something that’s vital, so you’ve got ingredients in food or something like that – specifications that are 18 pages long, but are short specifications, that probably wouldn’t get you into too much of an issue. However, if you just have an affiliate feed, and you have the exact same paragraph or two or three of text that everybody else on the web has, that probably would be more problematic.”
“So what’s the difference between them?” he continued. “Well, hopefully an ingredients list, as you’re describing it as far as the number of components or something probably relatively small – hopefully you’ve got a different page from all the other affiliates in the world, and hopefully you have some original content – something that distinguishes you from the fly-by-night sites that just say, ‘Okay, here’s a product. I got the feed and I’m gonna put these two paragraphs of text that everybody else has.’ If that’s the only value add you have then you should ask yourself, ‘Why should my site rank higher than all these hundreds of other sites when they have the exact same content as well?’”
He went on to note that if the majority of your content is the same content that appears everywhere else, and there’s nothing else to say, that’s probably something you should avoid. It all comes down to whether or not there’s added value, which is something Google has pretty much always stood by, and is reaffirmed in a newer video. Cutts took on the subject once again this week. This time, it was in response to this question: How does Google handle duplicate content and what negative effects can it have on rankings from an SEO perspective? “It’s important to realize that if you look at content on the web, something like 25 or 30 percent of all of the web’s content is duplicate content,” said Cutts. “There’s man page for Linux, you know, all those sorts of things. So duplicate content does happen. People will quote a paragraph of a blog, and then link to the blog. That sort of thing. So it’s not the case that every single time there’s duplicate content, it’s spam. If we made that assumption, the changes that happened as a result would end up, probably, hurting our search quality rather than helping our search quality.”
“So the fact is, Google looks for duplicate content and where we can find it, we often try to group it all together and treat it as if it’s one piece of content,” he continued. “So most of the time, suppose we’re starting to return a set of search results, and we’ve got two pages that are actually kind of identical. Typically we would say, “Ok, you know what? Rather than show both of those pages (since they’re duplicates) let’s just show one of those pages, and we’ll crowd the other result out.’ And if you get to the bottom of the search results, and you really want to do an exhaustive search, you can change the filtering so that you can say, okay, I want to see every singe page, and then you’d see that other page.”
“But for the most part, duplicate content is not really treated as spam.,” he said. “It’s just treated as something that we need to cluster appropriately. We need to make sure that it ranks correctly, but duplicate content does happen. Now, that said, it’s certainly the case that if you do nothing but duplicate content, and you’re doing in in abusive, deceptive or malicious or a manipulative way, we do reserve the right to take action on spam.” He mentions that someone on Twitter was asking how to do an RSS autoblog to a blog site, and not have that be viewed as spam.
“The problem is that if you are automatically generating stuff that’s coming from nothing but an RSS feed, you’re not adding a lot of value,” said Cutts. “So that duplicate content might be a little more likely to be viewed as spam. But if you’re just making a regular website, and you’re worried about whether you have something on the .com and the .co.uk, or you might have two versions of your Terms and Conditions – an older version and a newer version – or something like that. That sort of duplicate content happens all the time on the web, and I really wouldn’t get stressed out about the notion that you might have a little bit of duplicate content. As long as you’re not trying to massively copy for every city and every state in the entire United States, show the same boiler plate text….for the most part, you should be in very good shape, and not really have to worry about it.”
In case you’re wondering, quoting is not considered duplicate content in Google’s eyes. Cutts spoke on that late last year. As long as you’re just quoting, using an excerpt from something, and linking to the original source in a fair use kind of way, you should be fine. Doing this with entire articles (which happens all the time) is of course a different story.
Google, as you know, designs its algorithms to abide by its quality guidelines, and duplicate content is part of that, so this is something you’re always going to have to consider. It says right in the guidelines, “Don’t create multiple pages, subdomains, or domains with substantially duplicate content.”
They do, however, offer steps you can take to address any duplicate content issues that you do have. These include using 301s, being consistent, using top-level domains, syndicating “carefully,” using Webmaster Tools to tell Google how you prefer your site to be indexed, minimizing boilerplate repetition, avoiding publishing stubs (empty pages, placeholders), understanding your conteent management system and minimizing similar content.
Google advises blocking it from indexing duplicate content though, so think about that too. This is because it won’t be able to detect when URLs point to the same content, and will have to treat them as separate pages. Use the canonical link element.