High-performance XML (IV): subtree transformations without re-parsing
Note: this entry has moved.
In a previous post I showed how to load and transform subsets of a document with the XPathNavigatorReader. In the example I used, which follows the MSDN documentation one (under the section "Transforming a Section of an XML Document"), XML parsing is happening once, but in-memory document building is happening for each subtree being transformed, effectively loading those fragments in memory twice. The relevant piece of code is:
XPathNodeIterator books = doc.CreateNavigator().Select("/library/book");
while (books.MoveNext())
{
// There's no XML re-parsing, but a new XPathDocument is loaded!
XPathDocument tmpDoc = new XPathDocument(
new XPathNavigatorReader(books.Current));
...
As
Oleg apropriately
pointed out, the definite solution (and the one he used for the
Mvp.Xml project XmlNodeNavigator) is to have a wrapper navigator that doesn't allow an
XPathNavigator to go outside a certain scope.
Now the
Mvp.Xml project has that solution for all
XPathNavigator implementations, the
SubtreeXPathNavigator. This class is very
similar in nature and implementation to Oleg's. Usage is
straightforward: you just pass a newly constructed
SubtreeXPathNavigator to the
XslTransform class, and it will work on
the subtree starting at the navigator received in the
constructor, which is considered the new root.
Again, I'll follow the MSDN documentation
example. Check my
previous post
for the original code. In the new version, only a single
line of code is changed. Inside the while loop,
instead of loading a new XPathDocument to
perform the transformation, a new
SubtreeXPathNavigator instance constructed:
XslTransform xslt = new XslTransform();
// Always pass evidence!
xslt.Load("print_root.xsl", null, this.GetType().Assembly.Evidence);
// Load the entire doc.
XPathDocument doc = new XPathDocument("library.xml");
// Create a new document for each child
XPathNodeIterator books = doc.CreateNavigator().Select("/library/book");
while (books.MoveNext())
{
// Transform the subtree defined by the current navigator scope.
xslt.Transform(new SubtreeeXPathNavigator(books.Current),
null, Console.Out, null);
}
Ignoring the time it takes to load the stylesheet and the XPath expression compilation (which should both be cached), this yields an amazing 3.5X performance boost, for this simple example. And it uses an XML input of 200 bytes, and a really trivial transformation!
Transforming subtrees may also be useful to reduce the complexity of your stylesheets, and help the XSLT processor in .NET (which is not the fastest in the world) to perform better.
The full Mvp.Xml project source code can be downloaded from SourceForge.
Enjoy and please give us feedback on the project!
Check out the Roadmap to high performance XML.