Using Lightweight Automation Framework to parse a web page
This post is meant to illustrate some capabilities of the Lightweight
Test Automation Framework.
Suppose I want to create a small application that displays the latest posts
that where made to our forum: http://forums.asp.net/1193.aspx. I
would like to issue a WebRequest to the forum and parse the HTML and find the
titles of all the posts in the main page. There are probably lots of libraries
to parse HTML content, but I'll show how you can use our framework to accomplish
this.
1. The first thing to do is to become familiar with the HTML page that you
want to parse, in this case I navigate to the forum and using a DOM inspector I
can see that all the links to the posts are inside table rows that have the
class attribute set to "CommonListRow"

2. Next, I created a new Console application and reference the
Microsoft.Web.Testing.Light.dll
3. Make a request to the server (in my example, I use the
System.Net.WebRequest class).
4. Use the static HtmlElement.Create(string html) to parse
the response into an HtmlElement.
5. Use the common API to find the elements that you need.
Here is the source code:
static void Main(string[] args)
{
// Create a request for the URL.
WebRequest request = WebRequest.Create("http://forums.asp.net/1193.aspx");
// Get the response.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// Get the stream containing content returned by the server.
Stream dataStream = response.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);
// Read the content.
string responseFromServer = reader.ReadToEnd();
// Remove the DOCTYPE
responseFromServer = Regex.Replace(responseFromServer, @"\<\!DOCTYPE.*?\>", String.Empty);
// Load the response into an HtmlElement
HtmlElement rootElement = HtmlElement.Create(responseFromServer);
// find all the post rows
HtmlElementFindParams findParams = new HtmlElementFindParams();
findParams.TagName = "tr";
findParams.Attributes.Add("class", "CommonListRow");
foreach (HtmlElement tableRow in rootElement.ChildElements.FindAll(findParams))
{
//find the first link within the row
HtmlAnchorElement link = (HtmlAnchorElement) tableRow.ChildElements.Find("a", 0);
// Display the title
Console.WriteLine(String.Format("\"{0}\"",link.CachedInnerText));
//Display the link
Console.WriteLine(String.Format("\thttp://forums.asp.net{0}\n",
link.CachedAttributes.HRef));
}
// Cleanup the streams and the response.
reader.Close();
dataStream.Close();
response.Close();
}
A couple of things to notice:
- The original response from the server contains a <!DOCTYPE> directive
before the main <html> tag. When constructing HtmlElement they most point
to a single root "tag". In this case the parser thinks there are 2 tags (the
DOCTYPE and the HTML tags) and would fail if we don't remove the DOCTYPE.
- Notice the use of HtmlElementFindParams to locate all table rows that have a
specific class.
- Notice the use of the strongly typed HtmlAnchorElement to quickly access its
HRef property.
Here is the console output when I run the program:

Hopefully this post has shown you some of the not-so-obvious things that you
can do with the Lightweight Test Automation Framework.
Federico Silva Armas
ASP.NET QA Team