How to detect search engine crawlers?


Today I was looking for a solution how to detect when a client is a search engine crawler, you can create a fancy solution for this, but in the .NET framework their is already a solution to detect search engine crawler. The property Request.Browser.Crawler. If you use this property you always get false even if the site is visited by a search engine crawler, that's because it's not configured in a default installation of .NET.

ASP.NET uses the <browsercaps> section in machine.config or web.config to determine the client browser is a crawler or not. In the default installation the crawler filter information is all blank,  that's why you'd always get false. To fix this problem, you should add the search engine crawler filters in the <browsercaps> and add this section to your web.config. Like this:

  1. <configuration>
  2.  
  3. <!-- ...... -->
  4.  
  5.   <system.web>
  6.  
  7.     <browserCaps>
  8.  
  9.       <filter>
  10.  
  11.        <!-- Google Crawler -->
  12.         <case match="Googlebot">
  13.           browser=Googlebot
  14.           crawler=true
  15.         </case>
  16.  
  17.         <!-- Yahoo Crawler -->
  18.         <case match="http\:\/\/help.yahoo.com\/help\/us\/ysearch\/slurp">
  19.           browser=YahooCrawler
  20.           crawler=true
  21.         </case>
  22.        
  23.         <!-- MSN Crawler -->
  24.         <case match="msnbot">
  25.           browser=msnbot
  26.           crawler=true
  27.         </case>
  28.        
  29.         <!-- check Alta Vista (Mercator) -->
  30.         <case match="Mercator">
  31.           browser=AltaVista
  32.           crawler=true
  33.         </case>
  34.  
  35.         <!-- check Slurp (Yahoo uses this as well) -->
  36.         <case match="Slurp">
  37.           browser=Slurp
  38.           crawler=true
  39.         </case>
  40.        
  41.         <!-- Baidu Crawler -->
  42.         <case match="Baiduspider">
  43.           browser=Baiduspider
  44.           crawler=true
  45.         </case>
  46.  
  47.         <!-- check Excite -->
  48.         <case match="ArchitextSpider">
  49.           browser=Excite
  50.           crawler=true
  51.         </case>
  52.  
  53.         <!-- Lycos -->
  54.         <case match="Lycos_Spider">
  55.           browser=Lycos
  56.           crawler=true
  57.         </case>
  58.  
  59.         <!-- Ask Jeeves -->
  60.         <case match="Ask Jeeves">
  61.           browser=AskJeaves
  62.           crawler=true
  63.         </case>
  64.  
  65.         <!-- IBM Research Web Crawler -->
  66.         <case match="http\:\/\/www\.almaden.ibm.com\/cs\/crawler">
  67.           browser=IBMResearchWebCrawler
  68.           crawler=true
  69.         </case>
  70.  
  71.       </filter>
  72.  
  73.     </browserCaps>
  74.  
  75.   </system.web>
  76.  
  77.  </configuration>


Tip: you can find more crawler info in your IIS logs ([Windows Folder]\system32\LogFiles)

From


7 Comments

Comments have been disabled for this content.