High-performance XML (IV): subtree transformations without re-parsing
Note: this entry has moved.
In a previous post I showed how to load and transform subsets of a document with the XPathNavigatorReader. In the example I used, which follows the MSDN documentation one (under the section "Transforming a Section of an XML Document"), XML parsing is happening once, but in-memory document building is happening for each subtree being transformed, effectively loading those fragments in memory twice. The relevant piece of code is:
XPathNodeIterator books = doc.CreateNavigator().Select("/library/book");
while (books.MoveNext())
{
// There's no XML re-parsing, but a new XPathDocument is loaded!
XPathDocument tmpDoc = new XPathDocument(
new XPathNavigatorReader(books.Current));
...
As Oleg apropriately pointed out, the definite solution (and the one he used for the Mvp.Xml project XmlNodeNavigator) is to have a wrapper navigator that doesn't allow an XPathNavigator
to go outside a certain scope. Now the Mvp.Xml project has that solution for all XPathNavigator
implementations, the SubtreeXPathNavigator
. This class is very similar in nature and implementation to Oleg's. Usage is straightforward: you just pass a newly constructed SubtreeXPathNavigator
to the XslTransform
class, and it will work on the subtree starting at the navigator received in the constructor, which is considered the new root.
Again, I'll follow the MSDN documentation example. Check my previous post for the original code. In the new version, only a single line of code is changed. Inside the while
loop, instead of loading a new XPathDocument
to perform the transformation, a new SubtreeXPathNavigator
instance constructed:
XslTransform xslt = new XslTransform();
// Always pass evidence!
xslt.Load("print_root.xsl", null, this.GetType().Assembly.Evidence);
// Load the entire doc.
XPathDocument doc = new XPathDocument("library.xml");
// Create a new document for each child
XPathNodeIterator books = doc.CreateNavigator().Select("/library/book");
while (books.MoveNext())
{
// Transform the subtree defined by the current navigator scope.
xslt.Transform(new SubtreeeXPathNavigator(books.Current),
null, Console.Out, null);
}
Ignoring the time it takes to load the stylesheet and the XPath expression compilation (which should both be cached), this yields an amazing 3.5X performance boost, for this simple example. And it uses an XML input of 200 bytes, and a really trivial transformation!
Transforming subtrees may also be useful to reduce the complexity of your stylesheets, and help the XSLT processor in .NET (which is not the fastest in the world) to perform better.
The full Mvp.Xml project source code can be downloaded from SourceForge.
Enjoy and please give us feedback on the project!
Check out the Roadmap to high performance XML.