High-performance XML (I): Dynamic XPath expressions compilation

Note: this entry has moved.

"Everybody" knows that precompiling your XPath expressions ahead of time, usually in static .ctors, improves execution performance. But what's actually happening at that "compilation" time? Unlike what many believe, compiled IL is NOT being generated, neither on-the-fly assemblies or anything like it. It's NOT like RegEx compilation, sadly. But what's going on is important for performance because it involves parsing, AST (abstract syntax tree) building, etc.

Executing an XPath expression involves the following:

  • Creating an XPathNavigator (if it's not executed directly against one)
  • Compiling the expression, which in turn involves:
    • Building an IQuery which consists of:
      • Parsing the expression: XPathParser.ParseXPathPattern (using a parser and a scanner)
      • Processing the AstNode built in the previous step: nodes, axis, operators, etc. Objects for each of them.
    • Initializing an CompiledXPathExpr object from the previous ones.
  • Constructing an XPathSelectionIterator with the expression, which involves:
    • Cloning the source navigator
    • Retrieving the query from the compiled expression
    • Setting the query context to another clone of the source navigator
  • On iteration through the list, the query is advanced until it reaches the end. Each advance returns a new XPathNavigator node which is used afterwards (or null if nothing else is found).
If the expression is s executed through an Evaluate and its ReturnType is not a ReturnType.NodeSet, in which case, the last 2 steps are ommited, and the expression is directly evaluated to an object (the IQuery built in this case can be an AndExpr, LogicalExpr, NumberFunctions, NumericExpr, etc).

It's important to realize that whether you use XmlDocument.Select or XmlDocument.SelectSingleNode, or you use an XPathNavigator, if you don't use a "precompiled" XPathExpression you will pay the expression validation and parsing each time you execute it.
As you can see from the process described above, the compilation step has nothing to do with validity of the expression with respect to an instance document (the actual XML to be queried). It's all about the expression itself, irrespective of the document. So, if your argument against precompiling the expression is that you need the document to call XmlDocument.CreateNavigator().Compile(expression), you're wrong. You don't need it. In fact, not only is the document irrelevant but also namespaces, custom functions and variables resolving, etc. So, you CAN precompile ALWAYS by doing something like this:

public class ExpressionsCache
{
static ExpressionsCache()
{
XmlDocument doc = new XmlDocument();
XPathNavigator nav = doc.CreateNavigator();

_cachedExpr = nav.Compile("/dsPubs/publishers/titles");
}
}

The other usual excuse for not compiling the expressions is that you have dynamic values that are appended at run-time that built the expression. For example:

private void DoSomethingWithExpensivePublisherBooks(
XPathNavigator document, string publisherId, double price)
{
string path = String.Concat("/dsPubs/publishers/titles[pub_id = ",
id, " and price < ", price.ToString(), "]");

XPathNodeIterator titles = document.Select(path);
}
(Note: incredibly long function names seem to be in vogue lately, haven't you noticed? Internal System.Web.FileChangesMonitor class has a method named StartMonitoringDirectoryRenamesAndBinDirectory. It clearly wins the price AFAIK!)

It seems unavoidable to build the string representing the XPath expression at run-time, right? WRONG. Extensibility in XPath classes and evaluation allows you to write expressions like the following:

"/dsPubs/publishers/titles[pub_id = $id and price < $price]"

Where the $id and $price variables are provided and resolved at evaluation time. This is achieved by building a custom XsltContext-derived class and providing the support for variables. This is explained in MSDN and in a rather brief how-to MS document. The catch, then, is having a programmer-friendly custom context that allows us to add variables to it, set it as the context of the precompiled expression (after cloning it to avoid threading issues) and execute the expression, having the variables evaluated at run-time:

// Expression precompiled somewhere else
XPathExpression expr =
DynamicContext.Compile("/dsPubs/publishers/titles[pub_id = $id and price < $price]");

// Using the expression with variables supplied
DynamicContext ctx = new DynamicContext();
ctx.AddVariable("id", id);
ctx.AddVariable("price", price);

// Clone expression for thread-safety
XPathExpression cloned = expr.Clone();
cloned.SetContext(ctx);

// Execute expression with variables!
XPathNodeIterator it = nav.Select(cloned);

The new code involves creating the DynamicContext class and adding the variables, and potentially the namespaces in use (just as you would do with an XmlNamespaceManager normally). With the new approach, you get between 1.5X and 2X performance boost!.

Check out the Roadmap to high performance XML.

The full project source code belongs to the Mvp.Xml project and can be downloaded from SourceForge. Enjoy!

7 Comments

Comments have been disabled for this content.