XmlNodes from XPathNodeIterator

Tuesday, March 9, 2004

Note: this entry has moved.

Every now and then I receive complains about XPathNodeIterator. You know, it allows iteration where each Current element is an XPathNavigator. Not too useful if you're looking for OuterXml, or are too-dependant on the XmlNode-based API (i.e. XmlDocument). The most worrying issue is that people use this argument against using compiled XPath expressions, which are known to significantly improve performance (see Performant XML (I) and Performant XML (II) articles). The reason is that in order to get an XmlNodeList, you have to use the SelectNodes method of the XmlNode (and therefore XmlDocument), whose signature is as follows:

public XmlNodeList SelectNodes(string xpath); public XmlNodeList SelectNodes(string xpath, XmlNamespaceManager nsmgr);

This means that most developers won't compile their expressions simply because in order to use the XPathExpression, they have to explicitly create a navigator for the node/document and work against the cursor-style API of the XPathNodeIterator and XPathNavigator:

// Statically compile and cache the expression. XPathExpression expr; // Init and load a document. XmlDocument document; // Create navigator, clone expression and execute query. XPathNodeIterator it = document.CreateNavigator().Select(expr.Clone()); while (it.MoveNext()) { // Do something with it.Current which is an XPathNavigator. }

This approach generally means that in order to optimize the code by compiling expression, you actually have to refactor significant pieces of your code. And you don't have any other choice if you need to sort the query by using XPathExpression.AddSort(). There's a solution to this problem, as usual :).

You know that the XPathNavigator is an abstract class that allows multiple underlying implementations to offer the same cursor-style API and gain the instant benefit of XPath querying. Aaron Skonnard has some interesting implementations showing this concept. Therefore, when you're iterating the results of the query, and asking for the current element, you're actually using something that is dependant on the implementation. Therefore, this object, besides being an XPathNavigator (that is, the XPathNodeIterator.Current property), can also implement other interfaces as part of the underlying implementation. As such, queries executed against an XmlNode-based element will have each Current element implementing IHasXmlNode whereas XPathDocument-based ones will implement IXmlLineInfo. And what is this useful for? Well, just to get access to additional information beyond the standard XPathNavigator API that depends on the concrete implementation. So, inside the while look above, we can ask:

while (it.MoveNext()) { if (it.Current is IHasXmlNode) { XmlNode node = ((IHasXmlNode)it.Current).GetNode(); // Work with your beloved DOM api ;) } }

Still, this doesn't solve the problem that you have to iterate diffently than you're used to, and that significant rewrites are still needed when you use XPathExpression for querying.
The solution is to use the knowledge about the underlying implementation (i.e. you KNOW you're querying against an XmlDocument) and get an easier API to it. This can be achieved by creating an IEnumerable class that provides iteration ofer the XPathNodeIterator but exposing the underlying XmlNode. Also, a helper method returning an array of XmlNodes is useful. It would be used as follows:

XPathNodeIterator it = doc.CreateNavigator().Select(expr.Clone()); XmlNodesEnumerable nodes = new XmlNodesEnumerable(it); foreach (XmlNode node in en) { Response.Write(node.OuterXml); } // Or use the array directly: XmlNode[] list = nodes.ToArray();

Complete code for the custom enumerable object and its internal enumerator implementation follows.

+ Collapsible code listing. /// <summary> /// Provides enumeration over an <see cref="XPathNodeIterator"></see> but /// exposing the underlying <see cref="XmlNode"></see> elements. /// </summary> public class XmlNodesEnumerable : IEnumerable { XPathNodeIterator _iterator; /// <summary> /// Constructs the iterator. /// </summary> /// <param name="iterator"/>The instance containing the nodes to iterate. public XmlNodesEnumerable(XPathNodeIterator iterator) { _iterator = iterator; } /// <summary> /// Returns all nodes in the underlying iterator as an array. /// </summary> /// <returns>An array with all nodes.</returns> public XmlNode[] ToArray() { ArrayList list = new ArrayList(); IEnumerator en = new XmlNodesEnumerator(_iterator); while (en.MoveNext()) { list.Add(en.Current); } return (XmlNode[]) list.ToArray(typeof(XmlNode)); } #region IEnumerable Members IEnumerator IEnumerable.GetEnumerator() { return new XmlNodesEnumerator(_iterator); } #endregion #region Inner XmlNodesEnumerator class /// <summary> /// Provides iteration over an <see cref="XPathNodeIterator"></see> but /// exposing the underlying <see cref="XmlNode"></see> elements. /// </summary> private class XmlNodesEnumerator : IEnumerator { XPathNodeIterator _iterator; /// <summary> /// Constructs the iterator. /// </summary> /// <param name="iterator"/>The instance containing the nodes to iterate. public XmlNodesEnumerator(XPathNodeIterator iterator) { _iterator = iterator; } #region IEnumerator Members /// <summary> /// Not supported. /// </summary> void IEnumerator.Reset() { throw new NotSupportedException("Can't reset this enumerator."); } /// <summary> /// Returns the current <see cref="XmlNode"></see>. /// </summary> /// <exception cref="ArgumentException">The current item in the /// underlying <see cref="XPathNodeIterator"></see> doesn't point to an <see cref="XmlNode"></see>.</exception> object IEnumerator.Current { get { IHasXmlNode node = _iterator.Current as IHasXmlNode; if (node == null) throw new ArgumentException("Can only traverse XmlNode iterators."); return node.GetNode(); } } /// <summary> /// Advances the iteration cursor. /// </summary> /// <returns>True if more nodes remain in the iterator.</returns> bool IEnumerator.MoveNext() { return _iterator.MoveNext(); } #endregion } #endregion }

Update: check an even better approach here.

Enjoy!

Check out the Roadmap to high performance XML.

Good stuff! I was just doing some DOM relating searching and thinking I should use the XPathNavigator instead. I immediately ran into the "hrm, how do I get back my XmlNode" quandry. Thanks for the tip.

Steve - Tuesday, March 9, 2004 5:58:00 PM

Strangely, I had to add the IHasXmlNode interface in order to implement XmlNodeList. I didn't want any secret connection between XSLT/XPath and our implemenation of the DOM. This allowed for the existence of the DOM and the XPathDocument, and for both to share the same implementation of XPath.

Matt - Thursday, March 11, 2004 12:00:00 AM

Yup, that's what I meant by "As such, queries executed against an XmlNode-based element will have each Current element implementing IHasXmlNode whereas XPathDocument-based ones will implement IXmlLineInfo."

Maybe I wasn't so clear after all, hehe ;)

Daniel Cazzulino - Tuesday, March 23, 2004 12:02:00 PM

thanks, yes i read yer article again right after i posted and thought 'doh', i spoke too soon.

reuben - Wednesday, March 24, 2004 5:32:00 AM

Aaron: my point still remains. If you call the SelectNodes without compiling the expression *ahead of time*, it will be re-compiled in each call!! This can be a real perf. issue if you're doing it in a tight loop, for example.

Measure and see for yourself ;)

dcazzulino - Friday, December 15, 2006 6:51:07 PM

5 Comments