XmlNodes from XPathNodeIterator
Note: this entry has moved.
Every now and then I receive complains about XPathNodeIterator
.
You know, it allows iteration where each Current
element is an XPathNavigator
.
Not too useful if you're looking for OuterXml
, or are
too-dependant on the XmlNode-based API (i.e. XmlDocument
). The
most worrying issue is that people use this argument against using compiled
XPath expressions, which are known to significantly improve performance (see
Performant XML (I) and
Performant XML (II) articles). The reason is that in order to get an
XmlNodeList, you have to use the SelectNodes method of the XmlNode (and
therefore XmlDocument), whose signature is as follows:
This means that most developers won't compile their expressions simply because
in order to use the XPathExpression
, they have to explicitly
create a navigator for the node/document and work against the cursor-style API
of the XPathNodeIterator
and XPathNavigator
:
This approach generally means that in order to optimize the code by compiling
expression, you actually have to refactor significant pieces of your code. And
you don't have any other choice if you need to sort the query by using XPathExpression.AddSort()
.
There's a solution to this problem, as usual :).
You know that the XPathNavigator
is an abstract class that allows
multiple underlying implementations to offer the same cursor-style API and gain
the instant benefit of XPath querying.
Aaron Skonnard has some interesting implementations showing this
concept. Therefore, when you're iterating the results of the query, and asking
for the current element, you're actually using something that is dependant on
the implementation. Therefore, this object, besides being an XPathNavigator
(that is, the XPathNodeIterator.Current property), can also implement other
interfaces as part of the underlying implementation. As such, queries executed
against an XmlNode-based element will have each Current element implementing IHasXmlNode
whereas XPathDocument
-based ones will implement IXmlLineInfo
.
And what is this useful for? Well, just to get access to additional information
beyond the standard XPathNavigator
API that depends on the
concrete implementation. So, inside the while look above, we can ask:
Still, this doesn't solve the problem that you have to iterate diffently than
you're used to, and that significant rewrites are still needed when you use XPathExpression
for querying.
The solution is to use the knowledge about the underlying implementation (i.e.
you KNOW you're querying against an XmlDocument
) and get an easier
API to it. This can be achieved by creating an IEnumerable
class
that provides iteration ofer the XPathNodeIterator
but exposing
the underlying XmlNode
. Also, a helper method returning an
array of XmlNodes is useful. It would be used as follows:
Complete code for the custom enumerable object and its internal enumerator implementation follows.
Update: check an even better approach here.
Enjoy!
Check out the Roadmap to high performance XML.