XmlNodes from XPathNodeIterator

Note: this entry has moved.

Every now and then I receive complains about XPathNodeIterator. You know, it allows iteration where each Current element is an XPathNavigator. Not too useful if you're looking for OuterXml, or are too-dependant on the XmlNode-based API (i.e. XmlDocument). The most worrying issue is that people use this argument against using compiled XPath expressions, which are known to significantly improve performance (see Performant XML (I) and Performant XML (II) articles). The reason is that in order to get an XmlNodeList, you have to use the SelectNodes method of the XmlNode (and therefore XmlDocument), whose signature is as follows:

public XmlNodeList SelectNodes(string xpath); public XmlNodeList SelectNodes(string xpath, XmlNamespaceManager nsmgr);

This means that most developers won't compile their expressions simply because in order to use the XPathExpression, they have to explicitly create a navigator for the node/document and work against the cursor-style API of the XPathNodeIterator and XPathNavigator:

// Statically compile and cache the expression. XPathExpression expr; // Init and load a document. XmlDocument document; // Create navigator, clone expression and execute query. XPathNodeIterator it = document.CreateNavigator().Select(expr.Clone()); while (it.MoveNext()) { // Do something with it.Current which is an XPathNavigator. }

This approach generally means that in order to optimize the code by compiling expression, you actually have to refactor significant pieces of your code. And you don't have any other choice if you need to sort the query by using XPathExpression.AddSort(). There's a solution to this problem, as usual :).

You know that the XPathNavigator is an abstract class that allows multiple underlying implementations to offer the same cursor-style API and gain the instant benefit of XPath querying. Aaron Skonnard has some interesting implementations showing this concept. Therefore, when you're iterating the results of the query, and asking for the current element, you're actually using something that is dependant on the implementation. Therefore, this object, besides being an XPathNavigator (that is, the XPathNodeIterator.Current property), can also implement other interfaces as part of the underlying implementation. As such, queries executed against an XmlNode-based element will have each Current element implementing IHasXmlNode whereas XPathDocument-based ones will implement IXmlLineInfo. And what is this useful for? Well, just to get access to additional information beyond the standard XPathNavigator API that depends on the concrete implementation. So, inside the while look above, we can ask:

while (it.MoveNext()) { if (it.Current is IHasXmlNode) { XmlNode node = ((IHasXmlNode)it.Current).GetNode(); // Work with your beloved DOM api ;) } }

Still, this doesn't solve the problem that you have to iterate diffently than you're used to, and that significant rewrites are still needed when you use XPathExpression for querying.
The solution is to use the knowledge about the underlying implementation (i.e. you KNOW you're querying against an XmlDocument) and get an easier API to it. This can be achieved by creating an IEnumerable class that provides iteration ofer the XPathNodeIterator but exposing the underlying XmlNode. Also, a helper method returning an array of XmlNodes is useful. It would be used as follows:

XPathNodeIterator it = doc.CreateNavigator().Select(expr.Clone()); XmlNodesEnumerable nodes = new XmlNodesEnumerable(it); foreach (XmlNode node in en) { Response.Write(node.OuterXml); } // Or use the array directly: XmlNode[] list = nodes.ToArray();

Complete code for the custom enumerable object and its internal enumerator implementation follows.

+ Collapsible code listing.

Update: check an even better approach here.

Enjoy!

Check out the Roadmap to high performance XML.

5 Comments

  • Good stuff! I was just doing some DOM relating searching and thinking I should use the XPathNavigator instead. I immediately ran into the "hrm, how do I get back my XmlNode" quandry. Thanks for the tip.

  • Strangely, I had to add the IHasXmlNode interface in order to implement XmlNodeList. I didn't want any secret connection between XSLT/XPath and our implemenation of the DOM. This allowed for the existence of the DOM and the XPathDocument, and for both to share the same implementation of XPath.

  • Yup, that's what I meant by "As such, queries executed against an XmlNode-based element will have each Current element implementing IHasXmlNode whereas XPathDocument-based ones will implement IXmlLineInfo."

    Maybe I wasn't so clear after all, hehe ;)

  • thanks, yes i read yer article again right after i posted and thought 'doh', i spoke too soon.

  • Aaron: my point still remains. If you call the SelectNodes without compiling the expression *ahead of time*, it will be re-compiled in each call!! This can be a real perf. issue if you're doing it in a tight loop, for example.

    Measure and see for yourself ;)

Comments have been disabled for this content.