As I've said before, the MSNBot crawler is extremely inefficient when it comes to providing bang for your buck. This month alone, MSN's bot has crawled my site 2,295 times and has subsequently sucked up over 18 MB of my alloted monthly bandwidth (compared to Google's bot's 622 visits and 5 MB of bandwidth). In return, it has brought 6 visitors to my site (1% of the amount Google has)!
I've had it with the MSNBot, so I am finally going to add a robots.txt file to my site to ensure that it never gets past the front door. If inefficiency irks you as much as it does me, feel free to follow along as I provide a step-by-step guide to excluding MSNBot from crawling a website.
My first step was to verify that MSNBot will actually obey my order to go away, especially since I saw evidence that perhaps it wouldn't. Luckily, that evidence seems to be out of date, and the current MSNBot site promises to obey. So far so good.
My next step was to brush up on the rules of a robots.txt file, so that I would correctly block out MSNBot but still allow all other spiders to access my site. The Web Server Administrator's Guide to the Robots Exclusion Protocol provides all the examples anyone could ever need, so learning the rules was extremely easy.
The only thing left to do was to create the actual robots.txt file and upload it to the root directory of my server. I now have my very own robots.txt file and it looks like the following:
User-agent: MSNBot
Disallow: /
Easy as pie, right? And it should keep MSNBot away from my site from now on. Good riddance, MSNBot!
Comments
I wonder why msnbot is banging the SAME HTML file on my site more times a day, how come ???????
Post Comments
If you feel like commenting on the above item, use the form below. Your email address will be used for personal contact reasons only, and will not be shown on this website.
I have also had it with MSNbot. It was using more than 50% of my total traffic bandwidth. Since I don't care to support MS, I see no reason to pay for them to do this on my site.
I added the disallow text to my robots.txt file just before the end of Oct. It looks like MSNbot is properly obeying the request, and now I am back to a normal traffic load.
Permalink