Remove Pages from Wayback Machine

In more than one UDRP proceeding, pages from the Wayback Machine (found on Archive.org) have been submitted as evidence against the domain owner. The Wayback Machine is an Internet archive that takes snapshots of a website at various points in time, and the pages are indexed on the site.

In the case of the Cleveland Browns UDRP for Browns.com in May of 2011, the team shared results from the Wayback Machine from 2005, which showed links to football related links and merchandise. This was one major factor that seems to have doomed the domain owner.

People might ask why someone would want to have their site removed from the Wayback Machine, if not for nefarious reasons. Let’s say you purchase a descriptive domain name today, and the name may also be considered part of another company’s trademark. Perhaps you love apples and bought AppleStore.com (only an example).

The fact that the registrant change might be considered by a UDRP panelist as a “new registration” could be one strike since it wouldn’t pre-date the trademark. If there were previously links on the site that may have infringed on the technology company’s usage, it could be strike number two. If you are planning to utilize the name in a way that does not infringe on another company’s marks, you shouldn’t have to worry that the company will come after you with evidence of previous infringement.

As we all know, there aren’t necessarily three strikes in a UDRP proceeding, and these two facts might satisfy the three elements required in a UDRP. That being said, you should be able to remove those pages from the Wayback Machine to avoid future problems, and there’s an easy way to do it.

Full details can be found on Archive.org, but the gist of it is that you need to add a special command in your robots.txt file that most websites have. According to the removal guide:

To exclude the Internet Archive’s crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

User-agent: ia_archiver
Disallow: /

It’s pretty easy to do, although I haven’t done it with any of my sites. I am not sure if there are downsides to this, and if so, what those downsides are, so you should look into this before undertaking this.

Blocking archives is kind of like rubbing salt in the wound when you’re trying to look up articles from back before a site expired. It defeats the entire purpose of the IA if you can’t use it when a site dies. I mean, if the site is still up, you can just view it, right? (Well, obviously) This kind of behavior surely doesn’t win anyone friends and goes a lot towards encouraging the wrong kind of thinking among policy makers/politicians.

Also, I don’t know about anyone else, but if I’m thinking about buying a domain and I can’t see what was on there before, I’m really hesitant to buy it. Who knows what kind of reputation it had/has and that I’m not aware of. No one wants to buy a dud! It also makes it trivially easy for Google to block your domains entirely if they all have that iaarchive tag. In theory, web browsers or firewalls could automatically block parked domains that way.

If the previous website owner authorized crawling, then they gave permission. The domain is just another type of phone number. The new person with that number doesn’t get to destroy voice mails I got previously from it or something like that just because the number’s owner changed. It doesn’t give the new owner any value but then again, it doesn’t really hurt them. Someone else used the analogy of a new renter expecting all the old business’s machinery, furniture, etc. to belong to them. Domain names/telephone numbers/addresses aren’t copyrights! The contents of the previously pointed-to server didn’t magically vanish but simply became inaccessible at that address. If the server was still up, you could just add the IP address to your HOSTS file and connect to it just like before. Entering it directly into the browser usually works but some sites look at the header before serving content (multihosted domains).

BTW, people now use NoScript and AdBlocker quite a bit. If everyone blocked Javascript and ads from domains they’ve never visited before, then those ads would become kind of useless. Just whitelist the sites you use all the time and don’t feel guilty, heh.

11 COMMENTS

Jp May 12, 2011 At 2:06 pm

All parking companies should do this. There is no value to domain holders in having an archive of a parked page anyway.

Abdu Tarabichi May 12, 2011 At 2:09 pm

The only drawback that I can see in blocking Archive.org from archiving your pages is that you won’t have evidence of using your domain in bona fide for a long period when you want to file arbitration against another domain registrant who is infringing on your brand.

Dan May 12, 2011 At 9:37 pm

Hi,

A large company, I will not mention loves to use this method and/or will just take “screen shots” if the site is still up.

And, If they sue you under the “Lanham act” in FEDERAL Court… and win a judgment against you, you can be fined up to 100K per domain & 3x the income you earned with the ‘infringing domain’.

Under this “Act”, they can also go after the previous owner and get the same type judgment against them…if they can prove the same kind of case against them.

Best,
Dan

* Not legal advice ~ Just my understanding of the law (Act) mentioned above.*

ASN5 May 13, 2011 At 10:04 pm

WayBack finds a way back!

Hey Elliot,

Just a note for you and your readers… When one requests that materials collected from their website be removed from IA’s archive, that’s not actually what happen – at least not today. What happens instead is similar to a “no display” tag being attached to the files. Not very reassuring for intellectual property owners, but factually what is happening.

dag May 14, 2011 At 1:16 pm

All parking companies should do this. There is no value to domain holders in having an archive of a parked page anyway.

Karen August 12, 2011 At 1:49 pm

Hello, on the wayback Q&A there is no mention of excluding a single specific page. Is this possible?
Or does the wayback simply exclude the entire site?
Thank you,
Karen

Joseph Whitehead May 19, 2012 At 6:48 am

Blocking archives is kind of like rubbing salt in the wound when you’re trying to look up articles from back before a site expired. It defeats the entire purpose of the IA if you can’t use it when a site dies. I mean, if the site is still up, you can just view it, right? (Well, obviously) This kind of behavior surely doesn’t win anyone friends and goes a lot towards encouraging the wrong kind of thinking among policy makers/politicians.

Also, I don’t know about anyone else, but if I’m thinking about buying a domain and I can’t see what was on there before, I’m really hesitant to buy it. Who knows what kind of reputation it had/has and that I’m not aware of. No one wants to buy a dud! It also makes it trivially easy for Google to block your domains entirely if they all have that iaarchive tag. In theory, web browsers or firewalls could automatically block parked domains that way.

If the previous website owner authorized crawling, then they gave permission. The domain is just another type of phone number. The new person with that number doesn’t get to destroy voice mails I got previously from it or something like that just because the number’s owner changed. It doesn’t give the new owner any value but then again, it doesn’t really hurt them. Someone else used the analogy of a new renter expecting all the old business’s machinery, furniture, etc. to belong to them. Domain names/telephone numbers/addresses aren’t copyrights! The contents of the previously pointed-to server didn’t magically vanish but simply became inaccessible at that address. If the server was still up, you could just add the IP address to your HOSTS file and connect to it just like before. Entering it directly into the browser usually works but some sites look at the header before serving content (multihosted domains).

BTW, people now use NoScript and AdBlocker quite a bit. If everyone blocked Javascript and ads from domains they’ve never visited before, then those ads would become kind of useless. Just whitelist the sites you use all the time and don’t feel guilty, heh.

ronald April 3, 2013 At 9:33 am

Hi guys,

I bought an existing domain name about 2 weeks ago. The website was created in 2003 and WayBackMachine archived webpages since then in 2003, 2004, 2006, 2009, and 2011. Can I as the new domain owner, have these ´old´ archived screenshots let removed as well? With the same robots.txt method or another one?
Thanks a lot.

- Elshad January 14, 2015 At 2:52 pm
  
  Hello Ronald!
  You can do it contacting to archive.org. I’m sure about it. Contact page: https://archive.org/about/contact.php. Contact email: info@archive.org.
  
- Henry January 29, 2016 At 2:42 pm
  
  Hi, Joseph:
  Did you send email and receivied any response from archive.org?
  
cxm322 January 21, 2019 At 2:44 pm

This no longer works. Archive . org is obviously ignoring robots.txt files since late 2017 or so. All my sites have the following in their robots.txt file…

User-agent: archive.org_bot
Disallow: /

User-agent: ia_archiver
Disallow: /

That used to work – all my sites used to NOT be archived in the wayback machine. But since sometime in late 2017 all of them are archived.