Date: 2005-01-24 21:46:00
Tags: web
search engine spider gone awry
So I happened to look at the log file for one of my web servers today, and noticed lots of requests for the Apache manual, which is installed and served by the default configuration. The requests were all coming from "msnbot/0.3", a search engine spider for MSN. I changed the Apache config to remove the manual, and it seems that the requests are coming in more frequently now that they return 404. I wonder how much time it will take before msnbot gives up.
[info]thomasj
2005-01-24T21:18:44Z
I wonder how many other Apache operators that happens to.
[info]pasketti
2005-01-24T21:56:06Z
Doesn't the default Apache root web page point to the documentation?

If it's trying to spider a broken link, it could take awhile to figure it out. And Apache 2 went all i18n, and has a couple dozen versions of the default page for different languages, so maybe it's trying them all...
[info]ghewgill
2005-01-24T22:23:10Z
Yes, the default page is how the search bot found the manual. And yes, this is apache 2 so there are about 10 megs of documents in there.
Greg Hewgill <greg@hewgill.com>