XmlNodes from XPathNodeIterator
Note: this entry has moved.
Every now and then I receive complains about
XPathNodeIterator. You know, it allows
iteration where each Current element is an
XPathNavigator. Not too useful if you're
looking for OuterXml, or are too-dependant on
the XmlNode-based API (i.e. XmlDocument). The
most worrying issue is that people use this argument against
using compiled XPath expressions, which are known to
significantly improve performance (see
Performant XML (I)
and
Performant XML (II)
articles). The reason is that in order to get an
XmlNodeList, you have to use the SelectNodes method of the
XmlNode (and therefore XmlDocument), whose signature is as
follows:
This means that most developers won't compile their
expressions simply because in order to use the
XPathExpression, they have to explicitly create
a navigator for the node/document and work against the
cursor-style API of the XPathNodeIterator and
XPathNavigator:
This approach generally means that in order to optimize the
code by compiling expression, you actually have to refactor
significant pieces of your code. And you don't have any
other choice if you need to sort the query by using
XPathExpression.AddSort(). There's a solution
to this problem, as usual :).
You know that the XPathNavigator is an abstract
class that allows multiple underlying implementations to
offer the same cursor-style API and gain the instant benefit
of XPath querying.
Aaron Skonnard has some interesting implementations showing this concept.
Therefore, when you're iterating the results of the query,
and asking for the current element, you're actually using
something that is dependant on the implementation.
Therefore, this object, besides being an XPathNavigator
(that is, the XPathNodeIterator.Current property), can also
implement other interfaces as part of the underlying
implementation. As such, queries executed against an
XmlNode-based element will have each Current element
implementing IHasXmlNode whereas
XPathDocument-based ones will implement
IXmlLineInfo. And what is this useful for?
Well, just to get access to additional information beyond
the standard XPathNavigator API that depends on
the concrete implementation. So, inside the while look
above, we can ask:
Still, this doesn't solve the problem that you have to
iterate diffently than you're used to, and that significant
rewrites are still needed when you use
XPathExpression
for querying.
The solution is to use the knowledge about the underlying
implementation (i.e. you KNOW you're querying against an
XmlDocument) and get an easier API to it. This
can be achieved by creating an
IEnumerable class that provides iteration ofer
the XPathNodeIterator but exposing the
underlying XmlNode. Also, a helper method
returning an array of XmlNodes is useful. It would be used
as follows:
Complete code for the custom enumerable object and its internal enumerator implementation follows.
Update: check an even better approach here.
Enjoy!
Check out the Roadmap to high performance XML.