XML Performance Checklist, and some issues on XPath evaluation
Note: this entry has moved.
DonXML pointed some issues with regards to the Checklist: XML Performance article. I believe the checklist (and the corresponding "full-length" explanations) could have benefit from more space to cover the topic. I agree with most of Don's comments. The only one I'm not so sure about is his assertion:
By implementing #1 (Use XPathDocument to process XPath statements), it forces you to break #2 (Avoid the // operator by reducing the search scope), since XPathNavigator.Select() always evaluates from the root, not from the context of the current cursor location.
This observation is partially true. I say partially because
you can reduce the scope of a search by explicitly
addressing the full hierarchy of nodes, instead of the "//"
which is a shortcut for "descendant-of-self". The real cost
of "//" is that all nodes being matched must not be
duplicated in the resulting node-set, and this incurrs an
additional calculation cost. For example, let's say you have
an XHTML document, and you want to process all links that
exist inside a paragraph. The XPath could be something
like: //p//a. Well, as you know, a <p> can be nested
in other <p> elements, so that an <a> can be
determined to (initially) satisfy the "//a" for two
<p> that happen to be parent and child. At this point,
the XPath evaluator must skip those <a> that have
already been matched. This is what makes the process much
more slower.
So, if you positively know that all your <a>s happen
as a direct child of <p>, and your <p>s you want
to process always appear inside the <body>, you could
get an amazing speed boost by replacing the query with
"/html/body/p/a". And I really mean *amazing*. Try for
yourself with an XHTML version of a long spec, for example
the XML Schema part 2.
But, going back to the main point raised by Don (and which helped me remember those ugly days when I stumbled with it), the core issue is that there's a conscious design decision of making the overload to Evaluate that receives an XPathNodeIterator as the context, absolutely useless. Let me explain (and what follows is exactly the use case I explained in the public newsgroup).
Let's say you have the Pubs database as XML. Now you have selected (for whatever reason) all titles with "//publishers/titles". This will be an XPathNodeIterator with the results:
At some point, let's say you need to work with all prices from that set of nodes. The navigator exposes an overload for the Evaluate method that receives an XPathNodeIterator object as the context to execute the evaluation on. It seems natural, then, to think that the following code would yield the results we expect:
The result I expect is a node-set (XPathNodeIterator) for each price child of the titles I passed as the second argument to Evaluate. Well, that isn't happening, because the "price" expression is being evaluated from the document root. So, what's this overload useful for?
The code Oleg used doesn't test the problem, as he's iterating each node (i.e. the nodes variable above) and evaluating on each of them without using the other overload. This works, just as the regular Select method does.