Using LINQ to XML (and how to build a custom RSS Feed Reader with it)
One of the big programming model improvements being made in .NET 3.5 is the work being done to make querying data a first class programming concept. We call this overall querying programming model "LINQ", which stands for .NET Language Integrated Query.
LINQ supports a rich extensibility model that facilitates the creation of efficient domain-specific providers for data sources. .NET 3.5 ships with built-in libraries that enable LINQ support against Objects, XML, and Databases.
What is LINQ to XML?
LINQ to XML is a built-in LINQ data provider that is implemented within the "System.Xml.Linq" namespace in .NET 3.5.
LINQ to XML provides a clean programming model that enables you to read, construct and write XML data. You can use LINQ to XML to perform LINQ queries over XML that you retrieve from the file-system, from a remote HTTP URL or web-service, or from any in-memory XML content.
LINQ to XML provides much richer (and easier) querying and data shaping support than the low-level XmlReader/XmlWriter API in .NET today. It also ends up being much more efficient (and uses much less memory) than the DOM API that XmlDocument provides.
Using LINQ to XML to query a local XML File
To get a sense of how LINQ to XML works, we can create a simple XML file on our local file-system like below that uses a custom schema we've defined to store RSS feeds:
I could then use the new XDocument class within the System.Xml.Linq namespace to open and query the XML document above. Specifically, I want to filter the <Feed> elements in the XML file and return a sequence of the non-disabled RSS feeds (where a disabled feed is a <Feed> element with a "status" attribute whose value is "disabled"). I could accomplish this by writing the code below:
VB:
C#:
Notice in the code-snippets above how I'm loading the XML file using the XDocument.Load(path) static method - which returns back an XDocument object. Because I'm running this code within ASP.NET, I'm using the Server.MapPath(path) helper method to resolve the correct path for my XML file relative to the page I'm running the code on.
Once I have an XDocument object for my XML file I can then write a LINQ query expression to retrieve the XML data I'm looking for. In the code above I'm querying over each of the <Feed> elements within the XML file. This is driven by this opening clause in the LINQ query expression:
from feed in feedXML.Decedents("Feed")
I'm then applying a filter that only returns back those "Feed" elements that either don't have a "status" attribute, or whose "status" attribute value is not set to "disabled":
Where (feed.Attribute("status") Is Nothing) OrElse (feed.Attribute("status").Value <> "disabled")
I am then using the select clause in our LINQ expression to indicate what data I want returned. If I simply wrote "select feed", LINQ to XML would return back a sequence of XElement objects that represents each of the XML element nodes that match my filter. In the code samples above, though, I am using the shaping/projection features of LINQ to instead define a new anonymous type on the fly, and I am defining two properties on it - Name and Feed - that I want populated using the <Name> and <Url> sub-elements under each <Feed> element:
Select Name = feed.Element("Name").Value, Url = feed.Element("Url").Value
As you can see above (and below), I can then work against this returned sequence of data just like I would any collection or array in .NET. VS 2008 provides full intellisense and compilation checking support over this anonymous type sequence:
I can also data-bind the results against any UI control in ASP.NET, Windows Forms, or WPF. For example, assuming I had a dropdownlist control defined in my page like so:
I could use the below LINQ to XML code to databind the results to it:
This will then produce a nice drop-downlist in our HTML page like so:
Hmm - What is this "anonymous type" thing?
In my code above I've taken advantage of a new language feature in VB and C# called "anonymous types". Anonymous types enable developers to concisely define inline CLR types within code, without having to explictly define a formal class declaration of the type. You can learn more about them in my previous New "Orcas" Language Feature: Anonymous Types blog post.
While anonymous types can be super useful when you want to locally iterate and work with data, we'll often want/need to define a standard class when passing the results of our LINQ query between multiple classes, across class library assemblies, and over web-services.
To enable this, I could define a non-anonymous class called "FeedDefinition" to represent our Feed data like so:
Note above how I'm using the new "Automatic Properties" feature of C# to define the properties (and avoid having to define a field for them).
I could then write the below method to return back a generics based List<FeedDefinition> collection containing FeedDefinition objects:
Note above how the only change I've made to the LINQ to XML query we were using before is to change the "select" clause from "select new" (with no type-name) to "select new FeedDefinition". With this change I'm now returning a sequence of FeedDefinition objects that I can pass from class to class, assembly to assembly, and across web-services.
Using LINQ to XML to Retrieve a Remote RSS XML Feed
The XDocument.Load(path) static method supports the ability open both XML files from the file-system, as well as remote XML feeds returned from an HTTP URL. This enables you to use it to access remote RSS feeds, REST APIs, as well as any other XML feed published on the web.
For an example of this in action, let's take a look at the XML of my blog's RSS feed (http://weblogs.asp.net/scottgu/rss.aspx):
I could write the LINQ to XML code below to retrieve the above blog post data from my RSS feed, and work with the individual feed items as .NET objects:
Note above how I am converting the "Published" field in the RSS field - which is a string in the XML - to a .NET DateTime object. Notice also how LINQ to XML includes a built-in XNamespace type that provides a type-safe way to declare and work with XML Namespaces (which I need to-do to retrieve the <slash:comments> element).
I could then take advantage of the composition features of LINQ to perform a further sub-query on the result, so that I filter over only those RSS posts that were published within the last 7 days using the code below:
As you can see above, you can feed the results of one LINQ query expression to be the input of another LINQ expression. This enables you to write very clean, highly composable, code.
Using LINQ Sub-Queries within a LINQ to XML Query Expression
If you look at the raw XML of my RSS feed, you'll notice that the "tag" comments for each post are stored as repeated <category> elements directly below each <item> element:
When designing the object model for a "BlogEntry" class, I might want to represent these <category> values as a sub-collection of strings. For example, using a "Tags" property that is a generic list of type string:
You might be wondering - how do we take a flat collection of <category> elements under <item> and transform them into a nested sub-collection of strings? The nice thing about LINQ is that it makes this type of scenario easy by allowing us to use nested LINQ query expressions like so:
This "shaping" power of LINQ, and its ability to take flat data structures and make them hierarchical (and take hierarchical data structures and make them flat) is super powerful. You can use this feature with any type of data source - regardless of whether it is XML, SQL, or plain old objects/arrays/collections.
Putting it all Together with a Simple RSS Feed Reader
The code snippets I've walked through above demonstrate how you can easily write LINQ to XML code to retrieve a list of RSS feeds from a local XML file, and how to remotely query an RSS feed to retrieve an individual feed's details and individual item post contents. I could obviously then take the resulting feed contents and data-bind it to a ASP.NET GridView or ListView control to provide a nice view of the blog feed:
I've built a simple sample application that puts all of these snippets together to deliver a simple RSS Reader with LINQ to XML and the new <asp:ListView> control. You can download it here. Included in the download is both a VB and C# version of the application.
Summary
LINQ to XML provides a really powerful way to efficiently query, filter, and shape/transform XML data. You can use it both against local XML content, as well as remote XML feeds. You can use it to easily transform XML data into .NET objects and collections that you can further manipulate and transfer across your application.
LINQ to XML uses the same core LINQ query syntax and concepts that LINQ to SQL, LINQ to Objects, LINQ to SharePoint, LINQ to Amazon, LINQ to NHibernate, etc. use when querying data. You can learn more about the LINQ query syntax and the supporting language features being added to VB and C# to support it from these previous blog posts of mine:
- Automatic Properties, Object Initializer and Collection Initializers
- Extension Methods
- Lambda Expressions
- Query Syntax
- Anonymous Types
You might also find these blog posts of mine useful to learn more about LINQ to SQL:
- Part 1: Introduction to LINQ to SQL
- Part 2: Defining our Data Model Classes
- Part 3: Querying our Database
- Part 4: Updating our Database
- Part 5: Binding UI using the ASP:LinqDataSource Control
In a future blog post I'll return to LINQ to XML and demonstrate how it can be used not just to query XML, but also to really cleanly generate XML output from a .NET data structure.
Hope this helps,
Scott