How to detect search engine crawlers?
Today I was looking for a solution how to detect when a client is a search engine crawler, you can create a fancy solution for this, but in the .NET framework their is already a solution to detect search engine crawler. The property Request.Browser.Crawler. If you use this property you always get false even if the site is visited by a search engine crawler, that's because it's not configured in a default installation of .NET.
ASP.NET uses the <browsercaps> section in machine.config or web.config to determine the client browser is a crawler or not. In the default installation the crawler filter information is all blank, that's why you'd always get false. To fix this problem, you should add the search engine crawler filters in the <browsercaps> and add this section to your web.config. Like this:
- <configuration>
- <!-- ...... -->
- <system.web>
- <browserCaps>
- <filter>
- <!-- Google Crawler -->
- <case match="Googlebot">
- browser=Googlebot
- crawler=true
- </case>
- <!-- Yahoo Crawler -->
- <case match="http\:\/\/help.yahoo.com\/help\/us\/ysearch\/slurp">
- browser=YahooCrawler
- crawler=true
- </case>
- <!-- MSN Crawler -->
- <case match="msnbot">
- browser=msnbot
- crawler=true
- </case>
- <!-- check Alta Vista (Mercator) -->
- <case match="Mercator">
- browser=AltaVista
- crawler=true
- </case>
- <!-- check Slurp (Yahoo uses this as well) -->
- <case match="Slurp">
- browser=Slurp
- crawler=true
- </case>
- <!-- Baidu Crawler -->
- <case match="Baiduspider">
- browser=Baiduspider
- crawler=true
- </case>
- <!-- check Excite -->
- <case match="ArchitextSpider">
- browser=Excite
- crawler=true
- </case>
- <!-- Lycos -->
- <case match="Lycos_Spider">
- browser=Lycos
- crawler=true
- </case>
- <!-- Ask Jeeves -->
- <case match="Ask Jeeves">
- browser=AskJeaves
- crawler=true
- </case>
- <!-- IBM Research Web Crawler -->
- <case match="http\:\/\/www\.almaden.ibm.com\/cs\/crawler">
- browser=IBMResearchWebCrawler
- crawler=true
- </case>
- </filter>
- </browserCaps>
- </system.web>
- </configuration>
Tip: you can find more crawler info in your IIS logs ([Windows Folder]\system32\LogFiles)
From