How to get an XmlNodeList from an XPathNodeIterator (reloaded)
Note: this entry has moved.
In a previous post I showed a possible approach to get an iterator for XmlNode
s from an XPathNodeIterator
. Please that post as it explains the problem in depth, and explains the reasons why you should move to using XPathNodeIterator
.
However, the solution I showed involved a new class that only had an IEnumerable
implementation. It wasn't compatible at all with the built-in XmlNodeList
(abstract) class. This time, for the Mvp.Xml project, I decided to do the right thing: inherit from XmlNodeList
and implement the whole thing. What this means is that if you have a method that returns an XmlNodeList
, as follows:
public void DoSomeStuff(XmlDocument document)
{
XmlNodeList nodes = GetTheRelevantNodes(document);
// Process the nodes.
}
private XmlNodeList GetTheRelevantNodes(XmlDocument document)
{
return document.SelectNodes(someQuery);
}
You can now simply change the method internal implementation to use cached XPathExpressions (as explained in Performant XML (I)) and keep the return value the same:
private XmlNodeList GetTheRelevantNodes(XmlDocument document)
{
XPathNodesIterator it = document.CreateNavigator().Select(
theCachedPerformantPreCompiledXPathExpression);
return XmlNodeListFactory.CreateNodeList(it);
}
Now you can focus on the cursor-style XML processing approach (and be ready for Whidbey where it's the "blessed" API), while maintaining "backwards" compatibility for your methods. Note, however, that the factory will throw an exception if you query a non-XmlDocument
store.
The XmlNodeList
class has the following signature:
public abstract class XmlNodeList : IEnumerable
{
// Methods
protected XmlNodeList();
public abstract IEnumerator GetEnumerator();
public abstract XmlNode Item(int index);
// Properties
public abstract int Count { get; }
public virtual XmlNode this[int] { get; }
}
(Note: for some strange reason, Reflector shows the indexer property of this particular class as a property with the name ItemOf
(?))
This may seem trival to implement, unless you know how the XPathNodeIterator
works. When it's returned from a query, the full document isn't evaluated. Rather, the query is advanced each time you move the iterator, thus reducing the initial performance impact of querying a potencially large document. Therefore, in order to maintain this performance advantage, I had to carefully implement the list so as to read from the iterator only the nodes actually needed. Of course, and just like the XPathNodeIterator
does, retrieving the Count
property requires the query to be evaluated against the whole document. Therefore:
Avoid retrieving the Count
property on either an XmlNodeList
or an XPathNodeIterator
at all costs!
So, the implementation basically advances the cursor when needed (for example when you access an item whose position hasn't been reached yet, or when you move the iterator), and caches the XmlNode
instances that are taken from the iterator through the IHasXmlNode
interface on the current node. This mechanism was explained in the post mentioned at the beginning.
Note that in order to reduce the API surface, the only available class is the factory itself, and the implementation of the wrapper itself as well as the enumerator are completely hidden from you, so you can keep using the familiar XmlNodeList
and let us change the implementation in the future at will ;).
The full Mvp.Xml project source code can be downloaded from SourceForge.
Enjoy and please give us feedback on the project!
Check out the Roadmap to high performance XML.