High-performance XML (I): Dynamic XPath expressions compilation
Note: this entry has moved.
"Everybody" knows that precompiling your XPath expressions
ahead of time, usually in static .ctors, improves execution
performance. But what's actually happening at that
"compilation" time? Unlike what many believe, compiled IL is
NOT being generated, neither on-the-fly assemblies or
anything like it. It's NOT like
RegEx compilation, sadly. But what's going on
is important for performance because it involves parsing,
AST (abstract syntax tree) building, etc.
Executing an XPath expression involves the following:
- Creating an XPathNavigator (if it's not executed directly against one)
-
Compiling the expression, which in turn involves:
-
Building an IQuery which consists of:
- Parsing the expression: XPathParser.ParseXPathPattern (using a parser and a scanner)
- Processing the AstNode built in the previous step: nodes, axis, operators, etc. Objects for each of them.
- Initializing an CompiledXPathExpr object from the previous ones.
-
Building an IQuery which consists of:
-
Constructing an XPathSelectionIterator with the
expression, which involves:
- Cloning the source navigator
- Retrieving the query from the compiled expression
- Setting the query context to another clone of the source navigator
- On iteration through the list, the query is advanced until it reaches the end. Each advance returns a new XPathNavigator node which is used afterwards (or null if nothing else is found).
Evaluate and its ReturnType is not a
ReturnType.NodeSet, in which case, the last 2
steps are ommited, and the expression is directly evaluated to
an object (the IQuery built in this case can be an AndExpr,
LogicalExpr, NumberFunctions, NumericExpr, etc).
It's important to realize that whether you use
XmlDocument.Select or
XmlDocument.SelectSingleNode, or you use an
XPathNavigator, if you don't use a
"precompiled" XPathExpression you will pay the
expression validation and parsing each time you execute
it.
As you can see from the process described above, the
compilation step has nothing to do with validity of the
expression with respect to an instance document (the actual
XML to be queried). It's all about the expression itself,
irrespective of the document. So, if your argument against
precompiling the expression is that you need the document to
call
XmlDocument.CreateNavigator().Compile(expression), you're wrong. You don't need it. In fact, not only is the
document irrelevant but also namespaces, custom functions
and variables resolving, etc. So, you
CAN precompile
ALWAYS by doing
something like this:
public class ExpressionsCache
{
static ExpressionsCache()
{
XmlDocument doc = new XmlDocument();
XPathNavigator nav = doc.CreateNavigator();
_cachedExpr = nav.Compile("/dsPubs/publishers/titles");
}
}
The other usual excuse for not compiling the expressions is that you have dynamic values that are appended at run-time that built the expression. For example:
private void DoSomethingWithExpensivePublisherBooks(
XPathNavigator document, string publisherId, double price)
{
string path = String.Concat("/dsPubs/publishers/titles[pub_id = ",
id, " and price < ", price.ToString(), "]");
XPathNodeIterator titles = document.Select(path);
}
System.Web.FileChangesMonitor class has a
method named
StartMonitoringDirectoryRenamesAndBinDirectory.
It clearly wins the price AFAIK!)
It seems unavoidable to build the string representing the XPath expression at run-time, right? WRONG. Extensibility in XPath classes and evaluation allows you to write expressions like the following:
"/dsPubs/publishers/titles[pub_id = $id and price < $price]"
Where the $id and $price variables
are provided and resolved at evaluation time. This is
achieved by building a custom
XsltContext-derived class and providing the
support for variables. This is explained in
MSDN
and in a
rather brief
how-to MS document. The catch, then, is having a
programmer-friendly custom context that allows us to add
variables to it, set it as the context of the precompiled
expression (after cloning it to avoid threading issues) and
execute the expression, having the variables evaluated at
run-time:
// Expression precompiled somewhere else
XPathExpression expr =
DynamicContext.Compile("/dsPubs/publishers/titles[pub_id = $id and price < $price]");
// Using the expression with variables supplied
DynamicContext ctx = new DynamicContext();
ctx.AddVariable("id", id);
ctx.AddVariable("price", price);
// Clone expression for thread-safety
XPathExpression cloned = expr.Clone();
cloned.SetContext(ctx);
// Execute expression with variables!
XPathNodeIterator it = nav.Select(cloned);
The new code involves creating the
DynamicContext class and adding the variables,
and potentially the namespaces in use (just as you would do
with an XmlNamespaceManager normally). With the new
approach, you get between
1.5X and 2X performance boost!.
Check out the Roadmap to high performance XML.
The full project source code belongs to the Mvp.Xml project and can be downloaded from SourceForge. Enjoy!