XmlReader/XmlTextReader from XPathNodeIterator - working with subsets of nodes

Note: this entry has moved.

As Dare noticed, this is the month of the XmlReader. Here's a new player: the XPathIteratorReader. This time, the scenario is the following: you have an XPathDocument or XPathNavigator, and need to get a reader containing a subset resulting from an XPath query. For example, you may need to return all feeds items from an XML file that contain a certain word in the title:

public XmlReader GetFeedsContaining(string theWord)<br /> {<br /> <br /> XPathDocument doc = new XPathDocument(theFeed);<br /> XPathNodeIterator it = doc.CreateNavigator().Select(<br /> "/rss/channel/item[contains(title,'" + theWord + "')]");<br /> <br /> return new XPathIteratorReader(it);<br /> }

Just like the XmlFragmentStream, this reader fakes a root node for the iterator. Additional constructor overloads allow you to change the default root node which is <root>:

public XPathIteratorReader(XPathNodeIterator iterator, string rootName)<br /> public XPathIteratorReader(XPathNodeIterator iterator, string rootName, string ns)

This reader also inherits from XmlTextReader, using the same technique of the XPathNavigatorReader. This means you can validate subsets of nodes against selective XML Schemas too. It also implements IXmlSerializable, so you can directly return this subset of nodes from a web service for example.

Subsets can be written down to disk easily using XmlWriter.WriteNode() method:

public void SaveFeedsContaining(string theWord, string toFile)<br /> {<br /> XPathDocument doc = new XPathDocument(theFeed);<br /> XPathNodeIterator it = doc.CreateNavigator().Select(<br /> "/rss/channel/item[contains(title,'" + theWord + "')]");<br /> <br /> using (StreamWriter sw = new StreamWriter(toFile, false))<br /> {<br /> XmlTextWriter tw = new XmlTextWriter(sw);<br /> tw.WriteNode(new XPathIteratorReader(it), false);<br /> tw.Close();<br /> }<br /> }

There are a couple interesting things inside this class:

  • It leverages the XPathNavigatorReader for each item in the iterator. So it basically passes through property and method calls to it.
  • Depth is increased by one all the time, except for the faked root element.
  • Instead of having ifs on all XmlTextReader overrides checking whether it's at the faked root or not, I decided to go for the more elegant approach of creating a FakedRootReader class. So the code in XPathIteratorReader becomes drastically simpler. It's mostly passing calls down to the current reader and that's it. Therefore, the only braching code exists in the Read method, and it's really trivial, basically checking with the current ReadState and creating the FakedRootReader if necessary: public override bool Read() <br /> {<br /> // Return fast if state is no appropriate.<br /> if (_current.ReadState == ReadState.Closed || _current.ReadState == ReadState.EndOfFile)<br /> return false;<br /> <br /> bool read = _current.Read();<br /> if (!read)<br /> {<br /> read = _iterator.MoveNext();<br /> if (read)<br /> {<br /> // Just move to the next node and create the reader.<br /> _current = new XPathNavigatorReader(_iterator.Current);<br /> return _current.Read();<br /> }<br /> else<br /> {<br /> if (_current is FakedRootReader &amp;&amp; _current.NodeType == XmlNodeType.EndElement)<br /> {<br /> // We're done!<br /> return false;<br /> }<br /> else<br /> {<br /> // We read all nodes in the iterator. Return to faked root end element.<br /> _current = new FakedRootReader(_rootname.Name, _rootname.Namespace, XmlNodeType.EndElement);<br /> return true;<br /> }<br /> }<br /> }<br /> <br /> return read;<br /> }
  • The IXmlSerializable implementation uses the following trick: it loads the incoming document, moves to the root and makes this the new "faked" root, and uses an iterator over all root node children as its new internal state :D. Here it is: void IXmlSerializable.ReadXml(XmlReader reader)<br /> {<br /> XPathDocument doc = new XPathDocument(reader);<br /> XPathNavigator nav = doc.CreateNavigator();<br /> <br /> // Pull the faked root out.<br /> nav.MoveToFirstChild();<br /> _rootname = new XmlQualifiedName(nav.LocalName, nav.NamespaceURI);<br /> <br /> // Get iterator for all child nodes.<br /> _iterator = nav.SelectChildren(XPathNodeType.All);<br /> }
+ As usual, if you just want the full class code to copy-paste on your project, here it is. As usual too, though, I strongly encourage you to take a look at the Mvp.Xml project ;)

The full Mvp.Xml project source code can be downloaded from SourceForge.

Enjoy and please give us feedback on the project!

Check out the Roadmap to high performance XML.

2 Comments

  • I looked over your implementation of MVP XML Library but its seem like you had done a kind of shortcut with the XpathNavigatorReader.

    Instead of relying on XmlReader, which is the low memory efficient way to handle XPath, you had used the original .NET XpathNavigator that is memory costly efficient parser.

  • Well, XmlReader doesn't handle XPath at all, so I'm not sure what you mean.

    In .NET, the most efficient way of using XPath is going through XPathDocument/XPathNavigator. There's no way you can execute XPath with an XmlReader...

Comments have been disabled for this content.