XPathNavigatorReader: reading, validating and serializing! (XmlReader/XmlTextReader over XPathNavigator)

Note: this entry has moved.

There are many reasons why developers don't use the XPathDocument and XPathNavigator APIs and resort to XmlDocument instead. I outlined some of them with regards to querying functionality in my posts about how to take advantage of XPath expression precompilation, and How to get an XmlNodeList from an XPathNodeIterator (reloaded).

XPathNavigator is a far superior way of accessing and querying data because it offers built-in support for XPath querying independently of the store, which automatically gain the feature and more importantly, because it abstracts the underlying store mechanism, which allows multiple data formats to be accessed consistently. The XML WebData team has seriously optimized the internal storage of XPathDocument, which results in important improvents both in loading time and memory footprint, as well as general performance. This was possible because the underlying store is completely hidden from the developer behind the XPathNavigator class, therefore, even the most drastic change in internal representation does not affect current applications.

However, some useful features of the XmlDocument and XmlReader classes are not available. Basically, I've created an XmlReader facade over the XPathNavigator class, which allows you to work against either an streaming or a cursor API. I'll discuss how the missing features are enabled by the use of the new XPathNavigatorReader class, part of the opensource Mvp.Xml project.

Examples use an XML document with the structure of the Pubs database.

Serialization as XML

Both the XmlDocument (more properly, the XmlNode) the and XmlReader offer built-in support to get a raw string representing the entire content of any node. XmlNode exposes InnerXml and OuterXml properties, whereas the XmlReader offers ReadInnerXml and ReadOuterXml methods.

Once you go the XPathDocument route, however, you completely loss this feature. The new XPathNavigatorReader is an XmlReader implementation over an XPathNavigator, thus providing the aforementioned ReadInnerXml and ReadOuterXml methods. Basically, you work with the XPathNavigator object, and at the point you need to serialize it as XML, you simply construct this new reader over it, and use it as you would with any XmlReader:

XPathDocument doc = new XPathDocument(input);<br /> XPathNavigator nav = doc.CreateNavigator();<br /> // Move navigator, select with XPath, whatever.<br /> <br /> XmlReader reader = new XPathNavigatorReader(nav);<br /> // Initialize it.<br /> if (reader.Read())<br /> {<br /> Console.WriteLine(reader.ReadOuterXml());<br /> // We can also use reader.ReadInnerXml();<br /> }

Another useful scenario is directly writing a fragment of the document by means of the XmlWriter.WriteNode method:

// Will select the title id.<br /> XPathExpression idexpr = navigator.Compile("string(title_id/text())");<br /> <br /> XPathNodeIterator it = navigator.Select("//titles[price &gt; 10]");<br /> while (it.MoveNext())<br /> {<br /> XmlReader reader = new XPathNavigatorReader(it.Current);<br /> <br /> // Save to a file with the title ID as the name.<br /> XmlTextWriter tw = new XmlTextWriter(<br /> (string) it.Current.Evaluate(idexpr) + ".xml", <br /> System.Text.Encoding.UTF8);<br /> <br /> // Dump it!<br /> writer.WriteNode(reader, false);<br /> writer.Close();<br /> }

This code saves each book with a price bigger than 10 to a file named after the title id. You can note that the reader works in the scope defined by the navigator passed to its constructor, effectively providing a view over a fragment of the entire document. It's also important to observe that even when an evaluation will cause a cursor movement to the navigator in it.Current, the reader we're using will not be affected, as the constructor clones it up-front. Also, note that it's always a good idea to precompile an expression that is going to be executed repeatedly (ideally, application-wide).

XmlSerializer-ready

The reader implements IXmlSerializable, so you can directly return it from WebServices for example. You could have a web service returning the result of an XPath query without resorting to hacks like loading XmlDocument s or returning an XML string that will be escaped. XPathDocument is not XML-serializable either. Now you can simply use code like the following:

[WebMethod]<br /> public XPathNavigatorReader GetData()<br /> {<br /> XPathDocument doc = GetDocument();<br /> XPathNodeIterator it = doc.CreateNavigator().Select("//titles[title_id='BU2075']");<br /> if (it.MoveNext())<br /> return new XPathNavigatorReader(it.Current);<br /> <br /> return null;<br /> }

This web service response will be:

<XPathNavigatorReader>
<titles>
<title_id>BU2075</title_id>
<title>You Can Combat Computer Stress!</title>
<type>business </type>
<pub_id>0736</pub_id>
<price>2.99</price>
<advance>10125</advance>
<royalty>24</royalty>
<ytd_sales>18722</ytd_sales>
<notes>The latest medical and psychological techniques for living with the electronic office. Easy-to-understand explanations.</notes>
<pubdate>1991-06-30T00:00:00.0000000-03:00</pubdate>
</titles>
</XPathNavigatorReader>

XML Schema Validation

Imagine the following scenario: you are processing a document, where only certain elements and their content need to be validated against an XML Schema, such as the contents of an element inside a soap:Body. If you're working with an XmlDocument, a known bug in XmlValidatingReader prevents you from doing the following:

XmlDocument doc = GetDocument(); // Get the doc somehow.<br /> XmlNode node = doc.SelectSingleNode("//titles[title_id='BU2075']");<br /> // Create a validating reader for XSD validation.<br /> XmlValidatingReader vr = new XmlValidatingReader(new XmlNodeReader(node));

The validating reader will throw an exception because it expects an instance of an XmlTextReader object. This will be fixed in Whidbey, but no luck for v1.x. You're forced to do this:

XmlDocument doc = GetDocument(); // Get the doc somehow.<br /> XmlNode node = doc.SelectSingleNode("//titles[title_id='BU2075']");<br /> <br /> // Build the reader directly from the XML string taken through OuterXml.<br /> XmlValidatingReader vr = new XmlValidatingReader(<br /> new XmlTextReader(new StringReader(node.OuterXml)));

Of course, you're paying the parsing cost twice here. The XPathNavigatorReader, unlike the XmlNodeReader, derives directly from XmlTextReader, therefore, it fully supports fragment validation. You can validate against XML Schemas that only define the node where you're standing. The following code validates all expensive books with a narrow schema, instead of a full-blown Pubs schema:

XmlSchema sch = XmlSchema.Read(expensiveBooksSchemaLocation, null);<br /> // Select expensive books.<br /> XPathNodeIterator it = navigator.Select("//titles[price &gt; 10]");<br /> while (it.MoveNext())<br /> {<br /> // Create a validating reader over an XPathNavigatorReader for the current node.<br /> XmlValidatingReader vr = new XmlValidatingReader(new XPathNavigatorReader(it.Current));<br /> <br /> // Add the schema for the current node.<br /> vr.Schemas.Add(sch);<br /> <br /> // Validate it!<br /> while (vr.Read()) {}<br /> }

This opens the possiblity for modular validation of documents, which is specially useful when you have generic XML processing layers that validate selectively depending on namespaces, for example. What's more, this feature really starts making the XPathDocument/XPathNavigator combination a more feature-complete option to XmlDocument when you only need read-only access to the document.

+ Implementation details. Expand only if you care to know a couple tricks I did ;)

+ As usual, if you just want the full class code to copy-paste on your project, here it is. I strongly encourage you to take a look at the Mvp.Xml project, as there're other cool goodies there!

Finally, I imagine you could even think about loading an XmlDocument from an XPathNavigator using the XPathNavigatorReader... although can't think of any good reason why would you want to do such a thing :S...

The full project source code can be downloaded from SourceForge .

Enjoy and please give us feedback on the project!

Special credits: the idea of a reader over a navigator isn't new. Aaron Skonnard did an implementation quite some time ago, as well as Don Box (you'll need to search the page for "XPathNavigatorReader". Mine is not based on theirs, and has features lacking on them, but they came first, that's for sure ;).

Check out the Roadmap to high performance XML.

6 Comments

  • Wow, Daniel, you rock!

    I thought XmlValidatingReader bug is insurmountable one and no workaround exists!

    Empty string constructor, hmmm, what I stupid was thinking about, it's so easy.



    It really seems to be working. Good stuff. I'll move the rest of my stuff to the repository so we can release something cool soon.

  • Hello Daniel,



    The Mvp.Xml project looks more interesting every time ! I think I'm going to use it.



    There's one thing I've been wanting to ask you, though: you keep saying that Evaluate()/Select() can move an XPathNavigator; do you have a repro case, since the doc explicitly says it's not the case, and Dare Obasanjo commented here that it must be a bug if it does. Ever since hearing about this bug I've been wary of Evaluate(); but I can't just Clone() all of my navigators - even if it were the cheapest operation - because somebody else is bound to come after me, read the doc and remove that &quot;unnecessary&quot; Clone(), see what I mean ?



    I hope this makes sense.

    --Jonathan

  • Hi Jonathan,

    I guess I'll have to stop saying that. When I implemented my Schematron.NET (back in v1.0 days) that was one of the things I had to care the most, because it happened.

    Thanks to your question, though, I wrote an NUnit test to check that with v1.1, and it's no longer the case. Therefore, no more need to clone anything! :D this is good news!

  • Coolness.

    And thanks for the quick reaction.

    Cheers,

    --Jonathan

  • Daniel,



    Does that mean you're going to be removing the Clone() call from your constructor in XPathNavigatorReader? - and edit the article to reflect the result of your testing? :)

  • Nope Jiho, I won't. And the reason is that this reader DOES move the navigator cursor, and I think it's best to leave the incoming navigator as-is. Otherwise, the external code constructing the reader could break my code by moving the cursor to unexpected positions.

    Remember that cloning is a really cheep operation, anyway.

Comments have been disabled for this content.