How to get an XmlNodeList from an XPathNodeIterator (reloaded)

Note: this entry has moved.

In a previous post I showed a possible approach to get an iterator for XmlNodes from an XPathNodeIterator. Please that post as it explains the problem in depth, and explains the reasons why you should move to using XPathNodeIterator.

However, the solution I showed involved a new class that only had an IEnumerable implementation. It wasn't compatible at all with the built-in XmlNodeList (abstract) class. This time, for the Mvp.Xml project, I decided to do the right thing: inherit from XmlNodeList and implement the whole thing. What this means is that if you have a method that returns an XmlNodeList, as follows:

public void DoSomeStuff(XmlDocument document)
{
XmlNodeList nodes = GetTheRelevantNodes(document);
// Process the nodes.
}

private XmlNodeList GetTheRelevantNodes(XmlDocument document)
{
return document.SelectNodes(someQuery);
}

You can now simply change the method internal implementation to use cached XPathExpressions (as explained in Performant XML (I)) and keep the return value the same:

private XmlNodeList GetTheRelevantNodes(XmlDocument document)
{
XPathNodesIterator it = document.CreateNavigator().Select(
theCachedPerformantPreCompiledXPathExpression);
return XmlNodeListFactory.CreateNodeList(it);
}

Now you can focus on the cursor-style XML processing approach (and be ready for Whidbey where it's the "blessed" API), while maintaining "backwards" compatibility for your methods. Note, however, that the factory will throw an exception if you query a non-XmlDocument store.

The XmlNodeList class has the following signature:

public abstract class XmlNodeList : IEnumerable
{
// Methods
protected XmlNodeList();
public abstract IEnumerator GetEnumerator();
public abstract XmlNode Item(int index);

// Properties
public abstract int Count { get; }
public virtual XmlNode this[int] { get; }
}

(Note: for some strange reason, Reflector shows the indexer property of this particular class as a property with the name ItemOf (?))

This may seem trival to implement, unless you know how the XPathNodeIterator works. When it's returned from a query, the full document isn't evaluated. Rather, the query is advanced each time you move the iterator, thus reducing the initial performance impact of querying a potencially large document. Therefore, in order to maintain this performance advantage, I had to carefully implement the list so as to read from the iterator only the nodes actually needed. Of course, and just like the XPathNodeIterator does, retrieving the Count property requires the query to be evaluated against the whole document. Therefore:

Avoid retrieving the Count property on either an XmlNodeList or an XPathNodeIterator at all costs!

So, the implementation basically advances the cursor when needed (for example when you access an item whose position hasn't been reached yet, or when you move the iterator), and caches the XmlNode instances that are taken from the iterator through the IHasXmlNode interface on the current node. This mechanism was explained in the post mentioned at the beginning.

+ The full code here was already showed in the previous post, but is reproduced here for your convenience.

Note that in order to reduce the API surface, the only available class is the factory itself, and the implementation of the wrapper itself as well as the enumerator are completely hidden from you, so you can keep using the familiar XmlNodeList and let us change the implementation in the future at will ;).

The full Mvp.Xml project source code can be downloaded from SourceForge.

Enjoy and please give us feedback on the project!

Check out the Roadmap to high performance XML.

3 Comments

  • Well, if you already have the 'this' or 'Item' indexers, there's no reason why you need to implement ItemOf. In fact, there's no reason why I should be forced to implement Item AND this as they are both the same concept. VB automatically interprets the 'this' as the "default Item property", so it's useless.

  • Exactly what browser is this page supposed to be viewed in? The code runs on as a single line in FF 1, 2, IE 6, 7...

  • Updated the text.
    Damn community server broke my HTML when I did a batch update of my entries :(

Comments have been disabled for this content.