WebRequest some site url meeting : The remote server returned an error: (403)

Avoid (403) Forbidden errors when using HttpWebRequest

I had an error when tried to open the page http://www.lycos.com/  using HtmlAgilityPack from my ASP.NET application.
The exception was System.Net.WebException: The underlying connection was closed: The server committed an HTTP protocol violation.
When I tried to reproduce the problem using the test function GetLycosUrl (see below) from WinForms application, it throwed the different (403) forbidden error. After some research I found that  HTTP protocol violation  can be ignored, if you specify UseUnsafeHeaderParsing=true in the configuration file. HttpWebRequest.UseUnsafeHeaderParsing  property is internal and read-only and can't be changed for particular instance of HttpWebRequest.
It is also suprising that WinForms application does not raise
HTTP protocol violation error even the debugger shows that UseUnsafeHeaderParsing  =false.
Now the question was why lycos send me back lycos send me back (403) forbidden error.
With the help of Fiddler I identified that the differense between IE and call from my application is UserAgent header. If header will be provided, valid page will be returned from HttpWebRequest.GetResponse.
To make  
HtmlAgilityPack more robust the change in HtmlWeb.Get(Uri uri, string method, string path, HtmlDocument doc) is required.
  //MNF 26/5/2005 Some web servers (e.g. http://www.lycos.com/) return 403 forbidden error unless UserAgent is specified 
  //so let's say it's IE6 
  
req.UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)"

  
//alternatively  it can be changed in PreRequest event handler

The function to test www.lycos.com url is the following:
        //Add using to the top of file
        //        using System.Net;
        //        using System.Diagnostics;
        
private void GetLycosUrl
()
        {
            
string url=@"http://www.lycos.com/"
;
            
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url
);
            
//the following line is critical, otherwise (403) forbidden error error will be returned
            
request.UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)"
;
            
try
            
{
                
HttpWebResponse response = (HttpWebResponse) request.GetResponse
();
            }
            
catch (Exception exc
)
            {
                
Debug.Assert(false,exc.ToString
());
                
throw
;
            }
        }

5 Comments

Comments have been disabled for this content.