Subset document loading and transformation with XPathNavigatorReader

Note: this entry has moved.

Thanks to Tom Smalley who pointed it, I fixed a bug that prevented the XPathNavigatorReader from being used for loading a new XmlDocument or XPathDocument. This feature is very useful if you have to apply a transformation to a subset of a document. For this purpose, the MSDN documentation suggests using XmlDocument both for the entire document loading as well as the subset, which is the most inefficient way of performing transformations in .NET.

The code suggested for this scenario is (I modified it to print each child of the root which is more useful):

XslTransform xslt = new XslTransform(); <br /> xslt.Load("print_root.xsl");<br /> // Load the entire doc.<br /> XmlDocument doc = new XmlDocument();<br /> doc.Load("library.xml");<br /> <br /> // Create a new document for each child<br /> foreach (XmlNode testNode in doc.DocumentElement.ChildNodes)<br /> <br /> {<br /> XmlDocument tmpDoc = new XmlDocument(); <br /> tmpDoc.LoadXml(testNode.OuterXml);<br /> <br /> // Transform the subset.<br /> xslt.Transform(tmpDoc, null, Console.Out, null);<br /> }

Note that there's double parsing for each node to be transformed as the temporary document is loaded from the raw string returned by the OuterXml property. With the XPathNavigatorReader you can avoid this parsing cost altogether, and work with the XSLT-optimized XPathDocument using the following code:

XslTransform xslt = new XslTransform();<br /> // Always pass evidence!<br /> xslt.Load("print_root.xsl", null, this.GetType().Assembly.Evidence);<br /> // Load the entire doc.<br /> XPathDocument doc = new XPathDocument("library.xml");<br /> <br /> // Create a new document for each child<br /> XPathNodeIterator books = doc.CreateNavigator().Select("/library/book");<br /> while (books.MoveNext())<br /> {<br /> // Load a doc from the current navigator using a reader over it.<br /> XPathDocument tmpDoc = new XPathDocument(<br /> new XPathNavigatorReader(books.Current)); <br /> <br /> // Transform the subset.<br /> xslt.Transform(tmpDoc, null, Console.Out, null);<br /> }

Note that XML parsing happens only once, when the full doc is loaded. For a dsPubs database dump relatively large (300Kb), and a little less trivial stylesheet, the later approach yields 2X performance increase (you already know you gain about 30% from using XPathDocument alone).

XPathNavigatorReader is part of the opensource Mvp.Xml project. The full project source code can be downloaded from SourceForge.

Enjoy and please give us feedback on the project!

Update: this technique does incur the cost of an additional parse step. Check High-performance XML (III): subtree transformations without re-parsing for a better approach.

Check out the Roadmap to high performance XML.

No Comments