Goodbye MSNBot!

Album Cover: The Bends

"All your insides fall to pieces; you just sit there wishing you could still make love."
Radiohead / High and Dry

Posted on September 24, 2004 8:55 AM in Web Development
Warning: This blog entry was written two or more years ago. Therefore, it may contain broken links, out-dated or misleading content, or information that is just plain wrong. Please read on with caution.

As I've said before, the MSNBot crawler is extremely inefficient when it comes to providing bang for your buck. This month alone, MSN's bot has crawled my site 2,295 times and has subsequently sucked up over 18 MB of my alloted monthly bandwidth (compared to Google's bot's 622 visits and 5 MB of bandwidth). In return, it has brought 6 visitors to my site (1% of the amount Google has)!

I've had it with the MSNBot, so I am finally going to add a robots.txt file to my site to ensure that it never gets past the front door. If inefficiency irks you as much as it does me, feel free to follow along as I provide a step-by-step guide to excluding MSNBot from crawling a website.

My first step was to verify that MSNBot will actually obey my order to go away, especially since I saw evidence that perhaps it wouldn't. Luckily, that evidence seems to be out of date, and the current MSNBot site promises to obey. So far so good.

My next step was to brush up on the rules of a robots.txt file, so that I would correctly block out MSNBot but still allow all other spiders to access my site. The Web Server Administrator's Guide to the Robots Exclusion Protocol provides all the examples anyone could ever need, so learning the rules was extremely easy.

The only thing left to do was to create the actual robots.txt file and upload it to the root directory of my server. I now have my very own robots.txt file and it looks like the following:

User-agent: MSNBot
Disallow: /

Easy as pie, right? And it should keep MSNBot away from my site from now on. Good riddance, MSNBot!

Comments

David Morgan on November 13, 2004 at 10:43 AM:

I have also had it with MSNbot. It was using more than 50% of my total traffic bandwidth. Since I don't care to support MS, I see no reason to pay for them to do this on my site.

I added the disallow text to my robots.txt file just before the end of Oct. It looks like MSNbot is properly obeying the request, and now I am back to a normal traffic load.

Permalink

Rasmus on October 24, 2011 at 8:52 AM:

I wonder why msnbot is banging the SAME HTML file on my site more times a day, how come ???????

Permalink

Post Comments

If you feel like commenting on the above item, use the form below. Your email address will be used for personal contact reasons only, and will not be shown on this website.

Name:

Email Address:

Website:

Comments:

Check this box if you hate spam.