High-performance XML (II): XPath execution tips
Note: this entry has moved.
As I programmed an XPath-only implementation of the
Schematron specification (soon an ISO standard and very cool XML
validation language, incredibly flexible and powerfull), called (surprisingly)
Schematron.NET, and part of the NMatrix project, I found many
interesting things about the internals of XPath execution.
And I needed to dig deep inside it because my implementation had to be more
performant than the reference implementation based on XSLT. And
it ended being an average 50% faster than the fastest-XSLT-engine-executed
version. During that trip, I found the following useful tips:
At first I was worried about the amount of
that goes on during execution. Further research showed that the
method only creates a new object and saves the references to the document, the
node and parentOfNs (don't know where it's used) variables. So it's really fast
and doesn't have any perf. impact. So, clone the navigator at will!
The only way to get at the xml contents of a navigator (i.e. node) is to
check whether it implements
IHasXmlNode, which is only true if the
navigator was constructed from an
XmlDocument. If it does, you can
access the underlying
XmlNode with the following code:
if (navigator is IHasXmlNode)
node = ((IHasXmlNode) nav).GetNode();
When we use an
Current object is always the same, that is,
a single object is created, and its internal values changed to reflect the undelying current node. Therefore,
if we want to track already-processed nodes, we can't use its hashcode or reference. The only (standard) way to compare
navigators is through the use of their
IsSamePosition(XPathNavigator other) method. So, if you need such a
mechanism (process some node only once), your only way (in principle) is to iterate through a collection of previously
saved navigators and compare them one by one with the current one. Note that you must clone the
XPathNavigator itself), or the position will be changed as you move on in the iteration.
XPathNavigator.Evaluate() produces a movement in the cursor position! So
always remember to clone before doing anything against a navigator,
or clone once, and later use
MoveTo(XPathNavigator original) to reposition again to the
For all but the smallest documents (or very few child nodes from the current position),
are 35-45% slower than
XPathNavigator.Select with an equivalent precompiled expression.
- Adding the string values (tokens, such as element and attribute names) that are expected in the
instance document to the navigator's
NameTable property, prior to executing the queries,
offers a marginal performance gain of 4-8%.
Check out the Roadmap to high performance XML.