Google: “Our Algorithms Have Gotten Pretty Good at Recognizing Similar Content”

There was an interesting new blog post on the Google Webmaster Tools blog yesterday discussing the issue of duplicate content and how Google doesn’t really look fondly upon websites that have the same information as other websites because of a poor user experience.

I hate taking large quotes from an article, but I think it’s important to know. Read the full post to get more information, but according to the blog post,

“Some less creative webmasters, or those short on time but with substantial resources on their hands, might be tempted to create a multitude of similar sites without necessarily adding unique information to any of these. From a user’s perspective, these sorts of repetitive sites can constitute a poor user experience when visible in search results. Luckily, over time our algorithms have gotten pretty good at recognizing similar content so as to serve users with a diverse range of information. We don’t recommend creating similar sites like that; it’s not a good use of your time and resources.”

This isn’t really new information, and it’s not surprising, but it’s something that domain owners need to take into consideration when developing their domain names. A lot of people have been asking about the issue of duplicate content lately, so this is certainly a good read.

Elliot Silver
Elliot Silver
About The Author: Elliot Silver is an Internet entrepreneur and publisher of DomainInvesting.com. Elliot is also the founder and President of Top Notch Domains, LLC, a company that has closed eight figures in deals. Please read the DomainInvesting.com Terms of Use page for additional information about the publisher, website comment policy, disclosures, and conflicts of interest. Reach out to Elliot: Twitter | Facebook | LinkedIn

25 COMMENTS

  1. I totally agree, they have gotten really good at seeing duplicate titles and descriptions, especially on social media sites. So, if you’re just retweeting stuff then you might consider manually changing or rewriting the title or headline of what you’re tweeting or socializing on the social media sites.

  2. Google’s Matt Cutts spoke at a Domain Roundtable conference in San Francisco a couple of years ago and I aksed him if, when people steal content from your site and put it on their own, Google is able to tell who had the content first so the original source is not penalized for duplicate content. He said, yes, Google can tell who published the content originally and the thief would be penalized accordingly. I was heartened by that since there are so many unscrupulous sites out their who swipe someone else’s work rather than do their own.

  3. You see, the problem is that the ability to determine who should be or prosper on the internet should not be Google’s business. We are creating a monster by letting Google make this determination. The Internet does not belong to Google. Something must be done about this.

    Google is predatory, Google is not this pretty logo-ed college kids and nerds business we always have in the back of our minds. They are out to corner the internet, and we should fight it – DOGGY DOG FIGHT to the end. Are you ready?

  4. I like the big G, they’ve always been fair with me and only slapped me when I broke their well published guidelines – they didnt delist me just suspended an adwords site. Because of google my rent is paid and I get to fly away occasionally too. If they are trying to keep a high bar on the internet then Im all for it.
    This original content thing is good, and timely for me since I just contracted a good writer at a preferential rate.
    @people who think google want to/own the web –
    um no, they have ZERO interest in whether you want to set up your own little website and do it your way, ignoring their guidelines which are essentially minimum best practice – its up to you. All they are saying is they prefer not to send THEIR customers (google searchers) your way, seems reasonable to me.

  5. I support LindaM in her assertions.

    Google is not about owning the internet. Google NEEDS entreprenuers like us that help bring it revenues.

    I would like to think that those here on this blog have the integrity to create ORIGINAL content. You will be rewarded in many ways other than what Google will do for you.

    For one, how about the good will and loyalty you will create with your visitors/customers that have nothing to do with Google.

    Makes sense doesn’t it?

  6. Mark & LindaM,

    It’s not about content. Of course development of a domain with interesting and relevant content is paramount, and better than the alternative. The question is who should make that decision? It should be made by the market, by the public, by the searcher. Google should not decide what is ‘good’ content for all. Your taste may not be good for me. When it comes to internet name, do you think Google is a good name? If they started today, their name is not a generic. It is a brand-able. How many people will type in Google as a natural word? Very few. Same goes for Yahoo; for Amazon; for Godaddy. Besides, when you go to Google’s home page, there isn’t much there. As far as relevancy, and development is concerned, it contains very little. But it does the job for what it is meant to do. It is not particularly nice looking, is it? They should not be in the business of saving the internet searcher time, or deciding what they should like. If a surfer doesn’t like a particular website, they will not bookmark it or return there. Also, why no transparency when it comes to paying PPC? Why abruptly decide to change the game in midstream?

    Finally, they give away so many things for free to strangle out the little guy. You both can go on praising Google thinking they will read this and put you on top page, and perhaps kill me. The point is that as an industry, we need to survive. You should work towards bringing equity and fairness to the process, and stop kissing the big boy’s ass all the time.

  7. @Uzoma

    Considering at this point I have ZERO (0) websites built out I have neither the desire nor fear nor inclination to kiss Google’s anything.

    Google is a tool, a service that exists at the mercy of every website built out. Whether it be small site, mini site, landing page or website with 80,000 pages every one of them has the upper hand of Google. In a way, you said this yourself. It is not a generic, it provides a service to searchers searching for info.

    There really is no need to be paranoid. The only constant is change. Ten years ago, very little of what we here do even existed. Ten years from now, the game will probably change.

    In my opinion, search engines are simply providing service to help illiminate wasted duplications of content. That is their business. And I believe it benefits us all. I really don’t think the Big Brother paranoia is justified.

  8. If you want peace of mind, acquire high quality domains with lots of natural type-in traffic so you’re never dependent on any search engine. You’ll always have money coming in no matter what.

    šŸ™‚

  9. Mark, I thank you for the caution re: Paranoia.

    Go to Google, type in ‘Lamp Shade’, you will get About 1,390,000 results (0.15 seconds). Now, can you tell me what a searcher will do with over 1.3 Million results? So, Google produced that result in 15 Seconds, how long will it take the Surfer to get to, say the 941,000th result, for instance? Talk about repetition! The are not above criticism. It will be wrong for the Government to step in and remove Google from the internet because their business model is inefficient. The same thing goes for the small company that is dealing in domain names and content. Google should spend more time explaining to the searcher what to do with 1,390,000 results. How long will it take to go through all those results? Mind you, some of their results are in the millions.

    It is not paranoia if the harm is real. If you owned some domains, and saw your PPC go from thousands of dollars a month to a few dollars, you will understand that something is wrong. Real WRONG. If you have noticed, the end user has not been drawn into the process of Domain Names, there is a reason for that. Trust me, the end user both in business and the public in general is out there, and we can’t draw them in unless this entire process is opened up. Google, and the Parkers, and the big registrars have robot txt files in most domains that has “disallow: PPC” commands. Try it, check your domains with a good software to detect this robot text files. I’m not paranoid. I want this business structured.

  10. Good stuff. I recall recently reading that nowadays we produce more DATA every two days than the entire composite sum of data produced from the beginning of time until 2003. The key word in that statistic is, of course, data. The reason why humanity’s pace of innovation isn’t thousands of times faster than it was in 2003 is because, despite the exponential increase in data production, the rate of INFORMATION — new, unique data (not simply a regurgitation or aggregation of what came before), or intelligent analysis of existing data. It continues to fascinate me that the majority of composers we listen to today lived in the 16th-19th centuries when the sum total of musical producers in the 21st century towers way over the number that existed hundreds of years ago, with one possible explanation being that the amount of innovation happening per capita is plummeting faster than population growth. I think Google’s algorithm upgrade is a step towards evolving from a data delivery engine into a knowledge engine that semantically understands the data being delivered, a transformation that will allow Google to better highlight valuable results and downgrade less useful ones.

  11. Consider content, but also consider what a Google spider sees when it crawls and how that relates to Gs larger philosophy about delivering quality content. It isn’t just about the ‘content’ ala articles and crowdsourced stuff. It’s also about the legitimacy of the site itself. That seems to be arbitrated in terms of unique code dynamics which don’t exist with garbage development.

    If the site only exists to splog, rank and get clicks (99% of “development” done by domainers), that is contrary to what G is trying to deliver to their users and their advertisers. As such, platform development is as much a danger for deindexing as enriching the page with duplicate content. We’ve seen numerous times- with BANS stores, recently with Epik- ‘domain monetization solutions’ that feign meaningful development, yet deliver little to no unique user experience, are at significant risk of deindexing. For the sake of pennies, you’ve hung an albatross around the neck of that now-deindexed valuable domain property, the permanence of which is still debatable. Using great domains as a platform for system-gaming mass development is like owning a $5,000,000 oceanfront home and handing over the keys to Section 8 renters for $200 a week. Don’t cry when you return to find the place trashed.

    So, yes, unique content helps, but so does a unique website that’s actually doing something unique. Don’t think that Gs spiders don’t instantly recognize scalable mass development. They do and unarguably, they don’t like it. We had 5 good years of making money with crap and meaningless websites. Now, the game has changed. On-page content is one part of the equation, but so is the content behind the curtains.

  12. Unfortunately they are really vague at times as to what they consider duplicate content. I have had a real issue with an AOM site lately due to this, yet I have original content on the site.

  13. Google is like a magazine or TV program. It is a platform for advertisers to sell goods and services. Google, is looking for the best website content to match up with its advertisers thereby giving their advertisers the most bang for their buck and Google the most money that it can extract from its clients.

    Now, here is the magic, the publishers of the content (the websites), get very little money via Adsense. Google basically keeps it all. So the world has become free workers for Google. And, if you copy content (in other words you don’t put in a full day of free work for Google), they fire you! We live in remarkable times.

  14. LS Morgan – Thank you. Great summary of things people need to be aware of with respect to developing domains into successful websites.

  15. @Joe AOM is Associate-o-Matic. AOM sites got massively de-listed in the past year since they’re mostly thin affiliate sites.

    @Eric From the looks of it, your site is pretty light on original content. Add a blog with diet drink reviews, submit the XML site map of the blog to Google. Post content regularly and share it with social networks. Get some quality inbound links. Eventually, you’ll get back in if you make the site unique enough.

  16. What nobody seems to take into consideration are the hundreds of syndication outlets, rss feeds and other content generation sources that fill millions of websites with identical information. So, how does Google handle Reuters, AP, etc?

    My understanding is that you only need a few pages or paragraphs/media content to avoid being blacklisted if you’re featuring syndicated feeds.

    It wouldn’t be fair or reasonable for Google to punish smaller syndicates and websites for featuring similar content if the large websites get away with the same thing.

    Any opinions on this?

  17. @Stephen Hi, Stephen,

    Google definitely does not give any points for rss feeds and other types of syndication, however it does differentiate between “authority” sites and those that aren’t. So, for example, Yahoo may use feeds from Reuter but Yahoo is recognized by Google as providing a huge amount of other services to its visitors besides these feeds.

    Google differentiates in a number of ways. Links from authority sites to other sites count a lot. If visitors are referred to a site from an authority site and visitors linger on the target site (these metrics are gathered from a number of sources including Analytics), this counts for a lot. If new content is published on a site an an authority site links to that new content, this counts.

    So, all content is not equal and all links are not equal. Google keeps track of all metrics for a specific site and a specific page on that site and gives it specific authority and that authority is used as a way to place ads and to determine SERP.

    Google is very sophisticated and more than ever before good content and authority links to that content are extremely important to Google. RSS alone is worth nothing and if that is all that is on a site it will probably get zero attention from Google.

  18. @ Rich,

    Hi, that makes sense what you say to a certain point, but does Reuters or other syndicates have a “source” site that then gets the credit for original content? Where does the “authority” site start when dealing out syndicated content to millions of sites?

  19. @Stephen

    Hi Stephen,

    I am sure that by this time Google has hardcoded in a database certain sites that are known to have authority. Call them them the Google 100 if you will. We all know which ones they are: Wikipedia, the large newspapers, the large multimedia websites, travel sites etc. Those guys are kings and anyone who gets links from those sites are princes and princesses. .gov sites and probably most of the major .edu sites (especially universities) are probably handled with equal authority.

    For the rest, there are many ways to rise to the top – e.g. links from authority sites, especially good metrics as measured by Google’s sources and algorithms, and probably just a sit down chat between a new site’s owners (those VC guys have pull) and Google – especially if there is lots of ad money at stake. šŸ™‚

    You may have noticed that new sites are often given a chance at the top with good SERP as Google tests out their appeal. Then they drop to some other spot. It is like a debut for the site to see how well it does, and if it flops then so does its placement.

    I enjoy watching my site do the “Google Dance” as I try to maintain my SERP ranking. Sometimes it is so tiring I wonder why I bother, but I figure that displaying the ability to rank high with keywords is one way of augmenting value in the domain.

  20. You’d be surprised how petty Google is! Since I posted on this blog, certain criticisms of Google, I have been banished from planet earth! Nobody even visits any of my vast holdings. Not even my family will talk to me, not even on the phone. I hope congress is listening. Some entities need to be broken up. God have mercy…I’m in a twilight zone now. Nobody to visit my sites šŸ™

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Recent Posts

Handoff to Dan on Imported Leads Can be Confusing

0
I've been using the lead import option at Dan.com more regularly. Although the 5% commission is not ideal, transactions tend to move more quickly...

ArtificialIntelligence.com Goes Up for Sale

7
I tried to buy the ArtificialIntelligence.com domain name multiple times over the last 10 years. The emails I sent to the registrant went unanswered,...

EU Gives More IP Protection to Food & Drink Producers

0
Did you know that some well-known food and drink varieties are protected intellectual property regulations? Popular types of drinks and foods that are protected...

Price Testing

1
In 2022, my wife and I decided our kids were ready for some big mountain skiing and we planned a trip to the Rocky...

GoDaddy Making You Sign in to See What You Renewed (Updated)

3
This morning, I noticed something different in a domain name renewal email from GoDaddy. Instead of telling me exactly what domain names I renewed...