WebRequest some site url meeting : The remote server returned an error: (403)
Avoid (403) Forbidden errors when using HttpWebRequest
I had an error when tried to open the page http://www.lycos.com/ using HtmlAgilityPack from my ASP.NET application.
The exception was System.Net.WebException: The underlying connection was closed: The server committed an HTTP protocol violation.
When I tried to reproduce the problem using the test function GetLycosUrl (see below) from WinForms application, it throwed the different (403) forbidden error. After some research I found that HTTP protocol violation can be ignored, if you specify UseUnsafeHeaderParsing=true in the configuration file. HttpWebRequest.UseUnsafeHeaderParsing property is internal and read-only and can't be changed for particular instance of HttpWebRequest.
It is also suprising that WinForms application does not raise HTTP protocol violation error even the debugger shows that UseUnsafeHeaderParsing =false.
Now the question was why lycos send me back lycos send me back (403) forbidden error.
With the help of Fiddler I identified that the differense between IE and call from my application is UserAgent header. If header will be provided, valid page will be returned from HttpWebRequest.GetResponse.
To make HtmlAgilityPack more robust the change in HtmlWeb.Get(Uri uri, string method, string path, HtmlDocument doc) is required.
//MNF 26/5/2005 Some web servers (e.g. http://www.lycos.com/) return 403 forbidden error unless UserAgent is specified
//so let's say it's IE6
req.UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)";
//alternatively it can be changed in PreRequest event handler
The function to test www.lycos.com url is the following:
//Add using to the top of file
// using System.Net;
// using System.Diagnostics;
private void GetLycosUrl()
{
string url=@"http://www.lycos.com/";
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);
//the following line is critical, otherwise (403) forbidden error error will be returned
request.UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)";
try
{
HttpWebResponse response = (HttpWebResponse) request.GetResponse();
}
catch (Exception exc)
{
Debug.Assert(false,exc.ToString());
throw;
}
}
The exception was System.Net.WebException: The underlying connection was closed: The server committed an HTTP protocol violation.
When I tried to reproduce the problem using the test function GetLycosUrl (see below) from WinForms application, it throwed the different (403) forbidden error. After some research I found that HTTP protocol violation can be ignored, if you specify UseUnsafeHeaderParsing=true in the configuration file. HttpWebRequest.UseUnsafeHeaderParsing property is internal and read-only and can't be changed for particular instance of HttpWebRequest.
It is also suprising that WinForms application does not raise HTTP protocol violation error even the debugger shows that UseUnsafeHeaderParsing =false.
Now the question was why lycos send me back lycos send me back (403) forbidden error.
With the help of Fiddler I identified that the differense between IE and call from my application is UserAgent header. If header will be provided, valid page will be returned from HttpWebRequest.GetResponse.
To make HtmlAgilityPack more robust the change in HtmlWeb.Get(Uri uri, string method, string path, HtmlDocument doc) is required.
//MNF 26/5/2005 Some web servers (e.g. http://www.lycos.com/) return 403 forbidden error unless UserAgent is specified
//so let's say it's IE6
req.UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)";
//alternatively it can be changed in PreRequest event handler
The function to test www.lycos.com url is the following:
//Add using to the top of file
// using System.Net;
// using System.Diagnostics;
private void GetLycosUrl()
{
string url=@"http://www.lycos.com/";
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);
//the following line is critical, otherwise (403) forbidden error error will be returned
request.UserAgent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)";
try
{
HttpWebResponse response = (HttpWebResponse) request.GetResponse();
}
catch (Exception exc)
{
Debug.Assert(false,exc.ToString());
throw;
}
}