How to get an XmlNodeList from an XPathNodeIterator (reloaded)

Wednesday, April 14, 2004

Note: this entry has moved.

In a previous post I showed a possible approach to get an iterator for XmlNodes from an XPathNodeIterator. Please that post as it explains the problem in depth, and explains the reasons why you should move to using XPathNodeIterator.

However, the solution I showed involved a new class that only had an IEnumerable implementation. It wasn't compatible at all with the built-in XmlNodeList (abstract) class. This time, for the Mvp.Xml project, I decided to do the right thing: inherit from XmlNodeList and implement the whole thing. What this means is that if you have a method that returns an XmlNodeList, as follows:

public void DoSomeStuff(XmlDocument document)
 {
 XmlNodeList nodes = GetTheRelevantNodes(document);
 // Process the nodes.
 }
 
 private XmlNodeList GetTheRelevantNodes(XmlDocument document)
 {
 return document.SelectNodes(someQuery);
 }

You can now simply change the method internal implementation to use cached XPathExpressions (as explained in Performant XML (I)) and keep the return value the same:

private XmlNodeList GetTheRelevantNodes(XmlDocument document)
 {
 XPathNodesIterator it = document.CreateNavigator().Select(
 theCachedPerformantPreCompiledXPathExpression);
 return XmlNodeListFactory.CreateNodeList(it);
 }

Now you can focus on the cursor-style XML processing approach (and be ready for Whidbey where it's the "blessed" API), while maintaining "backwards" compatibility for your methods. Note, however, that the factory will throw an exception if you query a non-XmlDocument store.

The XmlNodeList class has the following signature:

public abstract class XmlNodeList : IEnumerable
 {
 // Methods
 protected XmlNodeList();
 public abstract IEnumerator GetEnumerator();
 public abstract XmlNode Item(int index);
 
 // Properties
 public abstract int Count { get; }
 public virtual XmlNode this[int] { get; }
 }

(Note: for some strange reason, Reflector shows the indexer property of this particular class as a property with the name ItemOf (?))

This may seem trival to implement, unless you know how the XPathNodeIterator works. When it's returned from a query, the full document isn't evaluated. Rather, the query is advanced each time you move the iterator, thus reducing the initial performance impact of querying a potencially large document. Therefore, in order to maintain this performance advantage, I had to carefully implement the list so as to read from the iterator only the nodes actually needed. Of course, and just like the XPathNodeIterator does, retrieving the Count property requires the query to be evaluated against the whole document. Therefore:

Avoid retrieving the Count property on either an XmlNodeList or an XPathNodeIterator at all costs!

So, the implementation basically advances the cursor when needed (for example when you access an item whose position hasn't been reached yet, or when you move the iterator), and caches the XmlNode instances that are taken from the iterator through the IHasXmlNode interface on the current node. This mechanism was explained in the post mentioned at the beginning.

+ The full code here was already showed in the previous post, but is reproduced here for your convenience.

using System; using System.Collections; using System.Xml; using System.Xml.XPath; namespace Mvp.Xml { /// &lt;summary&gt; /// Constructs &lt;see cref="XmlNodeList"/&gt; instances from /// &lt;see cref="XPathNodeIterator"/&gt; objects. /// &lt;/summary&gt; public sealed class XmlNodeListFactory { private XmlNodeListFactory() {} #region Public members /// &lt;summary&gt; /// Creates an instance of a &lt;see cref="XmlNodeList"/&gt; that allows /// enumerating &lt;see cref="XmlNode"/&gt; elements in the iterator. /// &lt;/summary&gt; /// &lt;param name="iterator"&gt;The result of a previous node selection /// through an &lt;see cref="XPathNavigator"/&gt; query.&lt;/param&gt; /// &lt;returns&gt;An initialized list ready to be enumerated.&lt;/returns&gt; /// &lt;remarks&gt;The underlying XML store used to issue the query must be /// an object inheriting &lt;see cref="XmlNode"/&gt;, such as /// &lt;see cref="XmlDocument"/&gt;.&lt;/remarks&gt; public static XmlNodeList CreateNodeList(XPathNodeIterator iterator) { return new XmlNodeListIterator(iterator); } #endregion Public members #region XmlNodeListIterator private class XmlNodeListIterator: XmlNodeList { XPathNodeIterator _iterator; ArrayList _nodes = new ArrayList(); public XmlNodeListIterator(XPathNodeIterator iterator) { _iterator = iterator.Clone(); // Read first one to detect IHasXmlNode interface. ReadTo(0); if (!_done &amp;&amp; !(_nodes[0] is IHasXmlNode)) throw new ArgumentException(SR.GetString(SR.XmlNodeListFactory_IHasXmlNodeMissing)); } public override IEnumerator GetEnumerator() { return new XmlNodeListEnumerator(this); } public override XmlNode Item(int index) { return this[index]; } public override int Count { get { if (!_done) ReadToEnd(); return _nodes.Count; } } public override XmlNode this[int index] { get { if (index &gt;= _nodes.Count) ReadTo(index); return (XmlNode) _nodes[index]; } } /// &lt;summary&gt; /// Reads the entire iterator. /// &lt;/summary&gt; private void ReadToEnd() { while (_iterator.MoveNext()) { _nodes.Add(((IHasXmlNode)_iterator.Current).GetNode()); } _done = true; } /// &lt;summary&gt; /// Reads up to the specified index, or until the /// iterator is consumed. /// &lt;/summary&gt; private void ReadTo(int to) { while (_nodes.Count &lt;= to) { if (_iterator.MoveNext()) { _nodes.Add(((IHasXmlNode)_iterator.Current).GetNode()); } else { _done = true; return; } } } /// &lt;summary&gt; /// Flags that the iterator has been consumed. /// &lt;/summary&gt; private bool Done { get { return _done; } } bool _done; #region XmlNodeListEnumerator private class XmlNodeListEnumerator: IEnumerator { XmlNodeListIterator _iterator; int _position = -1; public XmlNodeListEnumerator(XmlNodeListIterator iterator) { _iterator = iterator; } #region IEnumerator Members void System.Collections.IEnumerator.Reset() { _position = -1; } bool System.Collections.IEnumerator.MoveNext() { _position++; _iterator.ReadTo(_position); // If we reached the end and our index is still // bigger, there're no more items. if (_iterator.Done &amp;&amp; _position &gt;= _position) return false; return true; } object System.Collections.IEnumerator.Current { get { return _iterator[_position]; } } #endregion } #endregion XmlNodeListEnumerator } #endregion XmlNodeListIterator } }

Note that in order to reduce the API surface, the only available class is the factory itself, and the implementation of the wrapper itself as well as the enumerator are completely hidden from you, so you can keep using the familiar XmlNodeList and let us change the implementation in the future at will ;).

The full Mvp.Xml project source code can be downloaded from SourceForge.

Enjoy and please give us feedback on the project!

Check out the Roadmap to high performance XML.

Well, if you already have the 'this' or 'Item' indexers, there's no reason why you need to implement ItemOf. In fact, there's no reason why I should be forced to implement Item AND this as they are both the same concept. VB automatically interprets the 'this' as the "default Item property", so it's useless.

Daniel Cazzulino - Tuesday, June 22, 2004 4:14:00 PM

Exactly what browser is this page supposed to be viewed in? The code runs on as a single line in FF 1, 2, IE 6, 7...

The Frog - Wednesday, March 21, 2007 5:17:33 PM

Updated the text.
Damn community server broke my HTML when I did a batch update of my entries :(

dcazzulino - Thursday, March 22, 2007 2:53:22 PM

3 Comments