Using LINQ to XML (and how to build a custom RSS Feed Reader with it)

One of the big programming model improvements being made in .NET 3.5 is the work being done to make querying data a first class programming concept.  We call this overall querying programming model "LINQ", which stands for .NET Language Integrated Query.

LINQ supports a rich extensibility model that facilitates the creation of efficient domain-specific providers for data sources.  .NET 3.5 ships with built-in libraries that enable LINQ support against Objects, XML, and Databases.

What is LINQ to XML?

LINQ to XML is a built-in LINQ data provider that is implemented within the "System.Xml.Linq" namespace in .NET 3.5.

LINQ to XML provides a clean programming model that enables you to read, construct and write XML data.  You can use LINQ to XML to perform LINQ queries over XML that you retrieve from the file-system, from a remote HTTP URL or web-service, or from any in-memory XML content. 

LINQ to XML provides much richer (and easier) querying and data shaping support than the low-level XmlReader/XmlWriter API in .NET today.  It also ends up being much more efficient (and uses much less memory) than the DOM API that XmlDocument provides. 

Using LINQ to XML to query a local XML File

To get a sense of how LINQ to XML works, we can create a simple XML file on our local file-system like below that uses a custom schema we've defined to store RSS feeds:

I could then use the new XDocument class within the System.Xml.Linq namespace to open and query the XML document above.  Specifically, I want to filter the <Feed> elements in the XML file and return a sequence of the non-disabled RSS feeds (where a disabled feed is a <Feed> element with a "status" attribute whose value is "disabled").  I could accomplish this by writing the code below:

VB:

C#:

Notice in the code-snippets above how I'm loading the XML file using the XDocument.Load(path) static method - which returns back an XDocument object.  Because I'm running this code within ASP.NET, I'm using the Server.MapPath(path) helper method to resolve the correct path for my XML file relative to the page I'm running the code on.

Once I have an XDocument object for my XML file I can then write a LINQ query expression to retrieve the XML data I'm looking for.  In the code above I'm querying over each of the <Feed> elements within the XML file.  This is driven by this opening clause in the LINQ query expression:

from feed in feedXML.Decedents("Feed")

I'm then applying a filter that only returns back those "Feed" elements that either don't have a "status" attribute, or whose "status" attribute value is not set to "disabled":

Where (feed.Attribute("status") Is Nothing) OrElse (feed.Attribute("status").Value <> "disabled")

I am then using the select clause in our LINQ expression to indicate what data I want returned.  If I simply wrote "select feed", LINQ to XML would return back a sequence of XElement objects that represents each of the XML element nodes that match my filter.  In the code samples above, though, I am using the shaping/projection features of LINQ to instead define a new anonymous type on the fly, and I am defining two properties on it - Name and Feed - that I want populated using the <Name> and <Url> sub-elements under each <Feed> element:

Select Name = feed.Element("Name").Value, Url = feed.Element("Url").Value

As you can see above (and below), I can then work against this returned sequence of data just like I would any collection or array in .NET.  VS 2008 provides full intellisense and compilation checking support over this anonymous type sequence:

I can also data-bind the results against any UI control in ASP.NET, Windows Forms, or WPF.  For example, assuming I had a dropdownlist control defined in my page like so:

I could use the below LINQ to XML code to databind the results to it:

This will then produce a nice drop-downlist in our HTML page like so:

Hmm - What is this "anonymous type" thing?

In my code above I've taken advantage of a new language feature in VB and C# called "anonymous types".  Anonymous types enable developers to concisely define inline CLR types within code, without having to explictly define a formal class declaration of the type.  You can learn more about them in my previous New "Orcas" Language Feature: Anonymous Types blog post.

While anonymous types can be super useful when you want to locally iterate and work with data, we'll often want/need to define a standard class when passing the results of our LINQ query between multiple classes, across class library assemblies, and over web-services. 

To enable this, I could define a non-anonymous class called "FeedDefinition" to represent our Feed data like so:

Note above how I'm using the new "Automatic Properties" feature of C# to define the properties (and avoid having to define a field for them).

I could then write the below method to return back a generics based List<FeedDefinition> collection containing FeedDefinition objects:

Note above how the only change I've made to the LINQ to XML query we were using before is to change the "select" clause from "select new" (with no type-name) to "select new FeedDefinition".  With this change I'm now returning a sequence of FeedDefinition objects that I can pass from class to class, assembly to assembly, and across web-services.

Using LINQ to XML to Retrieve a Remote RSS XML Feed

The XDocument.Load(path) static method supports the ability open both XML files from the file-system, as well as remote XML feeds returned from an HTTP URL.  This enables you to use it to access remote RSS feeds, REST APIs, as well as any other XML feed published on the web.

For an example of this in action, let's take a look at the XML of my blog's RSS feed (http://weblogs.asp.net/scottgu/rss.aspx):

I could write the LINQ to XML code below to retrieve the above blog post data from my RSS feed, and work with the individual feed items as .NET objects:

Note above how I am converting the "Published" field in the RSS field - which is a string in the XML - to a .NET DateTime object.  Notice also how LINQ to XML includes a built-in XNamespace type that provides a type-safe way to declare and work with XML Namespaces (which I need to-do to retrieve the <slash:comments> element).

I could then take advantage of the composition features of LINQ to perform a further sub-query on the result, so that I filter over only those RSS posts that were published within the last 7 days using the code below:

As you can see above, you can feed the results of one LINQ query expression to be the input of another LINQ expression.   This enables you to write very clean, highly composable, code.

Using LINQ Sub-Queries within a LINQ to XML Query Expression

If you look at the raw XML of my RSS feed, you'll notice that the "tag" comments for each post are stored as repeated <category> elements directly below each <item> element:

When designing the object model for a "BlogEntry" class, I might want to represent these <category> values as a sub-collection of strings.  For example, using a "Tags" property that is a generic list of type string:

You might be wondering - how do we take a flat collection of <category> elements under <item> and transform them into a nested sub-collection of strings?  The nice thing about LINQ is that it makes this type of scenario easy by allowing us to use nested LINQ query expressions like so:

This "shaping" power of LINQ, and its ability to take flat data structures and make them hierarchical (and take hierarchical data structures and make them flat) is super powerful.  You can use this feature with any type of data source - regardless of whether it is XML, SQL, or plain old objects/arrays/collections.

Putting it all Together with a Simple RSS Feed Reader

The code snippets I've walked through above demonstrate how you can easily write LINQ to XML code to retrieve a list of RSS feeds from a local XML file, and how to remotely query an RSS feed to retrieve an individual feed's details and individual item post contents.  I could obviously then take the resulting feed contents and data-bind it to a ASP.NET GridView or ListView control to provide a nice view of the blog feed:

I've built a simple sample application that puts all of these snippets together to deliver a simple RSS Reader with LINQ to XML and the new <asp:ListView> control.  You can download it here.  Included in the download is both a VB and C# version of the application.

Summary

LINQ to XML provides a really powerful way to efficiently query, filter, and shape/transform XML data.  You can use it both against local XML content, as well as remote XML feeds.  You can use it to easily transform XML data into .NET objects and collections that you can further manipulate and transfer across your application.

LINQ to XML uses the same core LINQ query syntax and concepts that LINQ to SQL, LINQ to Objects, LINQ to SharePoint, LINQ to Amazon, LINQ to NHibernate, etc. use when querying data. You can learn more about the LINQ query syntax and the supporting language features being added to VB and C# to support it from these previous blog posts of mine:

You might also find these blog posts of mine useful to learn more about LINQ to SQL:

In a future blog post I'll return to LINQ to XML and demonstrate how it can be used not just to query XML, but also to really cleanly generate XML output from a .NET data structure. 

Hope this helps,

Scott

35 Comments

  • You guys really did a great job, this will save me lots of time in the future.

    It would be great if you could tell us something about the performance of LINQ as I didn't have the chance yet to run LINQ under high load.

    Best regards,
    Andreas

  • Hi Andreas,

    >>>>>> You guys really did a great job, this will save me lots of time in the future. It would be great if you could tell us something about the performance of LINQ as I didn't have the chance yet to run LINQ under high load.

    I'm glad you are enjoying LINQ! In general the performance of LINQ will depend on the data provider you are going against.

    The LINQ to SQL provider is highly optimized and really delivers great performance (here is a nice blog post with more details: http://blogs.msdn.com/mattwar/archive/2007/07/05/linq-to-sql-rico-drops-the-other-shoe.aspx).

    The LINQ to XML provider under the covers uses the low-level XmlReader/Writer APIs which are very efficient. It *does not* load up a DOM tree (which tends to be expensive in terms of memory usage under high server load). I think you'll find LINQ to XML more than fast enough for your server scenarios.

    Hope this helps,

    Scott

  • Scott,

    Thanks for this post, it was exactly what I needed!

    One question though: how to access XML URLs that need authentication via REST (eg. Highrise's XML feeds)?

  • Two questions:

    1. How would you centralize populating BlogPost's properties? If I have many LINQ queries that instantiate BlogPost classes I would like to assign Title, Published, NumComments, etc. using maybe BlogPost's constructor; but what would I pass to BlogPost? What is item's type in "from item in rssFeed.Descendants("item") ..."?

    2. Not sure if possible, but will we have a proxy generator for XML like we have the data context for LINQ to SQL so that we could treat the XML as a tree of objects?

    Thank you!

  • this was really very helpful. I am going to try out something today itself

  • This looks very promising! I am working with a hectic XML file right now, LINQ sure would have made life alot easier! Cant wait for the release...

    Thank you very much, Scott.

  • In the past, with very big xml files it was recommended to use an XML TextReader (or SAX parser back in the day) as opposed to loading it all into a DOM. I realize this is LINQ, so different, but I'm wondering how it works under the hood? Would it work well with very large xml files? Or is it recommended more for xml "snippets" or generally smaller config files etc?

  • Another great article Scott! Are there plans to include a way (or a method I'm not familiar with) to flatten a hierarchy without a predefined set of levels? I guess kind of like recursive LINQ?

    Thanks,
    Zach

  • Wow... doesn't load a DOM tree in memory! That solves a lot of issues, thx MS. My only question though is what happens to Xpath queries? Any use for them if I have the ability to use LINQ instead?

  • Dear Scott,
    Thanks for all your posts, VS2005 has a "view grid" function for *.xml files to edit, where is it in VS2008?

    Regards,

  • Hi PBZ,

    >>>>>> 1. How would you centralize populating BlogPost's properties? If I have many LINQ queries that instantiate BlogPost classes I would like to assign Title, Published, NumComments, etc. using maybe BlogPost's constructor; but what would I pass to BlogPost? What is item's type in "from item in rssFeed.Descendants("item") ..."?

    You could define a constructor on the BlogPost class I defined above - and instead of setting its values via properties, you could instead pass them in as constructor arguments. You could change the select clause to accomplish this:

    select new BlogPost(item.Element("Link").Value, item.Element("Title").Value, etc).

    2. Not sure if possible, but will we have a proxy generator for XML like we have the data context for LINQ to SQL so that we could treat the XML as a tree of objects?

    >>>>>> The LINQ to XML query above is effectively allowing you to define a custom tree of objects out of your XML. You could alternatively use the LINQ to XSD proxy generator to perform strongly typed queries against XML data that has an XSD schema defined. You can learn more about this here: http://oakleafblog.blogspot.com/2007/06/linq-to-xsd-preview-alpha-02-for-orcas.html and here: http://weblogs.asp.net/fmarguerie/archive/2007/01/15/linq-to-xsd-typed-xml-programming-with-linq.aspx

    Hope this helps,

    Scott

  • Hi Ken,

    >>>>>> In the past, with very big xml files it was recommended to use an XML TextReader (or SAX parser back in the day) as opposed to loading it all into a DOM. I realize this is LINQ, so different, but I'm wondering how it works under the hood? Would it work well with very large xml files? Or is it recommended more for xml "snippets" or generally smaller config files etc?

    LINQ to XML is implemented using a XMLTextReader internally - which means that it doesn't need to materialize a DOM tree in order to perform its queries against the XML stream. This makes its implementation really efficient, and should be about as fast as writing your own XMLTextReader code from scratch (but a heck of a lot easier and more flexible!). I believe it should perform very well with large text files.

    Hope this helps,

    Scott

  • Thanks for the nice post Scott! I've got my first LINQ application up running after reading this post. Very helpful!

    Thanks,
    - Bruce

  • Hi Zach,

    >>>>> Another great article Scott! Are there plans to include a way (or a method I'm not familiar with) to flatten a hierarchy without a predefined set of levels? I guess kind of like recursive LINQ?

    You should be able to handle this pretty well using LINQ to XML. Because queries can feed the results of other queries, you can create some nice nested queries that allow you to recursively search and act on results.

    Hope this helps,

    Scott

  • Scott is a wonderful person and a true mentor to all those seeking to learn new technology. Good work Scott!! Thank you for taking the time to write quality material

  • I just wanted to mention that while I love LINQ and LINQ to XML is actually my favorite part of LINQ, it is my understanding that in VS 2008 there is a new namespace and set of classes specifically for RSS. I believe (not on a machine right now with VS2008 beta 2) it was in System.Syndication. I know this post was just meant to be a cool LINQ to XML demo, but for someone actually wanting to write some RSS code, they might want to check out those Syndication classes. James Conard gave a lecture here in Atlanta a couple weeks back and mentioned it.

    I just thought some might want to know that did not.

    Thanks.

  • I was wondering if I could request a demo, warning it is probably rather large. I checked out the demo on silverlight.net which uses Silverlight 1.1, client side c# code and html controls. I am wondering if you could take that demo further and show us how to use LINQ and this silverlight\html front end to perform CRUD operations. As well as CRUD could you also make use of the partial class validation technique in your previous demos to perform both client and server side validation.

    Or if you know of an existing demo that would already do this could you please post a link. Thanks in Advance.

  • Hi Bill,

    >>>>> Another great article -- thanks. For the VB code and snippet, I would suggest that readers consider using VB 9.0's new XML literals and axis properties instead of the API approach you are using. For example:

    Good tip - thanks for posting it!

    Thanks,

    Scott

  • Hi Joe,

    >>>>>> I just wanted to mention that while I love LINQ and LINQ to XML is actually my favorite part of LINQ, it is my understanding that in VS 2008 there is a new namespace and set of classes specifically for RSS. I believe (not on a machine right now with VS2008 beta 2) it was in System.Syndication. I know this post was just meant to be a cool LINQ to XML demo, but for someone actually wanting to write some RSS code, they might want to check out those Syndication classes. James Conard gave a lecture here in Atlanta a couple weeks back and mentioned it.

    You are right about the new RSS feature - it lives in the System.ServiceModel.Syndication namespace and provides a built-in RSS subscription API. I will put it on my list of things to blog about in the future. :-)

    Thanks,

    Scott

  • Hi Bydia2,

    >>>>> I like the flattening or unflattening features... so does this mean that I can access self referencing SQL tables with Linq to SQL? It would be good to see an article on this. I use self referencing tables for everything from eBooks to DB file system.

    Yes - I believe you should be able to-do this. I haven't done it myself yet, but will put it on the list of things todo.

    Thanks,

    Scott

  • Hi Steve,

    >>>>>> I was wondering if I could request a demo, warning it is probably rather large. I checked out the demo on silverlight.net which uses Silverlight 1.1, client side c# code and html controls. I am wondering if you could take that demo further and show us how to use LINQ and this silverlight\html front end to perform CRUD operations. As well as CRUD could you also make use of the partial class validation technique in your previous demos to perform both client and server side validation.

    I definitely have this on the list of things to-do, although it might be a few more months before I blog it (there are some additional features of Silverlight like databinding and layout support that will be coming online then).

    The good news is that LINQ to Objects and LINQ to XML will both be supported by Silverlight - which means the code above for querying remote XML feeds just works the same in Silverlight. We are also going to make it easy to transfer data using LINQ back and forth before Silverlight and a server.

    Hope this helps,

    Scott

  • Hi Jim,

    >>>>>> Scott, As always your posts are great. In case anyone wants to see the RSS with XML in action, I did a webcast of it which was recently posted at devauthority.com/.../66845.aspx. I would love your reaction to it.

    Great webcast! Thanks for sharing it!

    Scott

  • Hi Scott,

    Thanks for the tip. I switched from our automatic configuration script in IE to our proxy server. It works now.

  • Thanks for the article, verty interesting.

    One question, Does this "engine" works fine with complex RSS Feeds? I have faced problems using DataSets with nested entries on the RSS XML (on RSS feeds like DevX and others they use to fail to read).

  • Hi Scott,

    Is there any quick way to select a single instance of your RSS entry via LINQ and then move that into your FeedDefinition class.

    eg. var feed = ???
    select new FeedDefinition
    {
    ....
    }

    By the way, love LINQ articles, keep 'em coming.

  • You should have a date on each of your blog posts.

  • >>I definitely have this on the list of things to-do, although it might be a few more months before I blog it (there are some additional features of Silverlight like databinding and layout support that will be coming online then).

    The good news is that LINQ to Objects and LINQ to XML will both be supported by Silverlight - which means the code above for querying remote XML feeds just works the same in Silverlight. We are also going to make it easy to transfer data using LINQ back and forth before Silverlight and a server.<<

    Hi Scott;
    I don't mean to repeat what you have already said, but just wanted to echo the importance and need to data binding in SL. I've been truly waiting for Microsoft to take Web development to the next level and I believe SL with full support for LINQ and data binding is the real key to go to the next level. I feel SL is one of the most important technologies MS has come out in recent years! Can't wait to get started on the real thing!

  • >>You should have a date on each of your blog posts.<<

    Hi Partha;
    There is a date for each blog, except it's located at the "end" of each blog right before the word "Comment". I was looking for that myself along with the title of the blog.

    Hope that helps!
    ..Ben

  • Hi Christian,

    >>>> using System.ServiceModel.Syndication;

    You got me on that one. ;-)

    I'm going to try and do a blog post that covers this as well.

    Thanks,

    Scott

  • Hi Braulio,

    >>>>> One question, Does this "engine" works fine with complex RSS Feeds? I have faced problems using DataSets with nested entries on the RSS XML (on RSS feeds like DevX and others they use to fail to read).

    The approach I use above should work with an RSS feed. Because it is just querying the XML directly, it should be fairly resiliant to different schemas.

    You might also want to check out the new RSS engine that is built-in in .NET 3.5 with the System.ServiceModel.Syndication namespace.

    Hope this helps,

    Scott

  • Hi Steve,

    >>>>> Is there any quick way to select a single instance of your RSS entry via LINQ and then move that into your FeedDefinition class.

    Yep! The good news is that you can use the "where" clause with LINQ to XML just like you can with LINQ to SQL and LINQ to Objects. Just add a where clause to filter the query down. You can then use the Single() or First() LINQ operator to retrieve the first object that matches the filter clause:

    BlogEntry entry = (from entry in RSS.BlogEntries
    where entry.Published == DateTime.Now
    select entry).First();

    Or just:

    BlogEntry entry = RSS.BlogEntries.First(r => r.Published == DateTime.Now);

    Hope this helps,

    Scott

  • Hi Scott
    Great post Scott, but I was wondering is it possible to use System.Xml.Linq in SilverLight 1.0 RC project and publish it on silverlight.live.com ? I was looking for that namespace in silverlight project and coldn't find it on reference list when I click "add reference".

  • Hi Scott,

    This is related to my comment on part 3 - (querying our database). Is there a quick way, or what is the recommended method for going from a linq to sql query result to xml or JSON?

    donal

  • Hi discovia,

    >>>>>>> This is related to my comment on part 3 - (querying our database). Is there a quick way, or what is the recommended method for going from a linq to sql query result to xml or JSON?

    You can use LINQ to XML to both parse XML data as well as generate it. I'll do a blog post in the future that shows how to-do this.

    Thanks,

    Scott

  • Excellent tutorial Scott.

    I am also VERY glad to hear LINQ to objects and LINQ to XML will be included in Silverlight. This will bring Silverlight's querying capabilities and XML capabilities on par.

    Thanks so much for the great article.

    I for one am overwhelmed by all the new data technologies every year it seems to pile on additional ways to query / use the same data however blogs like yours definately make things more clear.

Comments have been disabled for this content.