Using Lightweight Automation Framework to parse a web page

Published Friday, February 13, 2009 12:35 AM

 This post is meant to illustrate some capabilities of the Lightweight Test Automation Framework.

Suppose I want to create a small application that displays the latest posts that where made to our forum: http://forums.asp.net/1193.aspx. I would like to issue a WebRequest to the forum and parse the HTML and find the titles of all the posts in the main page. There are probably lots of libraries to parse HTML content, but I'll show how you can use our framework to accomplish this.

1. The first thing to do is to become familiar with the HTML page that you want to parse, in this case I navigate to the forum and using a DOM inspector I can see that all the links to the posts are inside table rows that have the class attribute set to "CommonListRow"

Forum Page

2. Next, I created a new Console application and reference the Microsoft.Web.Testing.Light.dll

3. Make a request to the server (in my example, I use the System.Net.WebRequest class).

4. Use the static HtmlElement.Create(string html) to parse the response into an HtmlElement.

5. Use the common API to find the elements that you need.

Here is the source code:

        static void Main(string[] args)
{
// Create a request for the URL.
WebRequest request = WebRequest.Create("http://forums.asp.net/1193.aspx");

// Get the response.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();

// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);

// Read the content.
string responseFromServer = reader.ReadToEnd();

// Remove the DOCTYPE
responseFromServer = Regex.Replace(responseFromServer, @"\<\!DOCTYPE.*?\>", String.Empty);

// Load the response into an HtmlElement
HtmlElement rootElement = HtmlElement.Create(responseFromServer);

// find all the post rows
HtmlElementFindParams findParams = new HtmlElementFindParams();
findParams.TagName = "tr";
findParams.Attributes.Add("class", "CommonListRow");
foreach (HtmlElement tableRow in rootElement.ChildElements.FindAll(findParams))
{
//find the first link within the row
HtmlAnchorElement link = (HtmlAnchorElement) tableRow.ChildElements.Find("a", 0);

// Display the title
Console.WriteLine(String.Format("\"{0}\"",link.CachedInnerText));

//Display the link
Console.WriteLine(String.Format("\thttp://forums.asp.net{0}\n",
link.CachedAttributes.HRef));
}

// Cleanup the streams and the response.
reader.Close();
dataStream.Close();
response.Close();

}

A couple of things to notice:

  • The original response from the server contains a <!DOCTYPE> directive before the main <html> tag. When constructing HtmlElement they most point to a single root "tag". In this case the parser thinks there are 2 tags (the DOCTYPE and the HTML tags) and would fail if we don't remove the DOCTYPE.
  • Notice the use of HtmlElementFindParams to locate all table rows that have a specific class.
  • Notice the use of the strongly typed HtmlAnchorElement to quickly access its HRef property.

 

Here is the console output when I run the program:

 Output

Hopefully this post has shown you some of the not-so-obvious things that you can do with the Lightweight Test Automation Framework.

Federico Silva Armas
ASP.NET QA Team

 

Comments

No Comments

Leave a Comment

(required) 
(required) 
(optional)
(required)