Reading XML fragments with XmlTextReader - XmlFragmentStream

Note: this entry has moved.

Back at the 2004 MVP Global Summit, I met fellow XML fan Kirk, who was seeking a solution to the following problem: you have a (several) megabytes file containing multiple XML fragments, and you want to read it (in his case specially through the SgmlReader). The problem is, of course, that the XmlTextReader will throw an exception as soon as it finds the second fragment, unless you use the special ctor. overload that takes an XmlParsingContext. Dare shows an alternate solution based on XML inclusion techniques, either DTD external entities or XInclude.

These techniques effectively expose a fully well-formed document to your application, which has a number of benefits, including the ability to transform it if you need to, for example. But I was thinking more along the lines of providing a class that could actually read the fragments without resorting to those mechanisms. I couldn't cheat the XmlTextReader, so I decided to go one step lower. The result is the XmlFragmentStream, a class that wraps any System.IO.Stream and fakes the missing root element, so that an XmlTextReader layered on top of it, will think the document is well-formed. Here's how to use it:

Given the following XML fragments:

<event> <ip>127.0.0.1</ip> <http_method>GET</http_method> ... </event> <event> <ip>127.0.0.1</ip> <http_method>POST</http_method> ... </event><br />...

You can read (and even validate with an XmlValidatingReader) using this code:

using (Stream stm = File.OpenRead("events.xml"))<br />{<br /> XmlTextReader tr = new XmlTextReader(new XmlFragmentStream(stm));<br /> // Do performant ref comparison<br /> string ev = tr.NameTable.Add("event");<br /> while (tr.Read())<br /> {<br /> if (tr.LocalName == ev)<br /> // Process it!<br /> }<br />}

The XmlFragmentStream class also contain two contructor overloads that allow you to specify the name and namespace of the enclosing root element (by default <root>):

public XmlFragmentStream(Stream innerStream, string rootName)
public XmlFragmentStream(Stream innerStream, string rootName, string namespaceURI)

This technique is proven by a real world (surely happy) customer Kirk helped ;). What's more, he even contributed a bug-fix he found when using it.
The performance impact of this approach in negligible because the class is basically an intermediary with minimal processing.

As Oleg noted pointed in a comment (and motivated a slight editing in this post), as well as showed in his weblog, you can do this with the aforementioned special XmlTextReader constructor overload, passing an XmlParsingContext. This is more cumbersome, in my opinion, and still leaves you with the problem of not having a valid XML document.

+ As usual, if you just want the full class code to copy-paste on your project, here it is. I strongly encourage you to take a look at the Mvp.Xml project, as there're other really cool goodies there!

The full Mvp.Xml project source code can be downloaded from SourceForge.

Enjoy and please give us feedback on the project!

Check out the Roadmap to high performance XML.

3 Comments

  • Well, it's 3:51AM at my place, so excuse my stupidity.

    AFAIR XmlTextReader reads XML fragments just fine. There are specials ctors for that.

  • Well, it's 1:11 AM my place, and Oleg is definitely right :).

    You have a special constructor in XmlTextReader. However, using the XmlParserContext is a bit cumbersome, and that technique doesn't play nice with the rest, such as XSLT transformations. I think I still prefer the XmlFragmentStream. It's easier to use, you don't have to know what a doctypeName or xmlLang is.

    If you google for XmlParserContext, it's obvious there's very few people using it, and I think it's mainly because of its poor interface.(I'd say it's because it exposes too many really low-level details about XML parsing).

  • Thanks for the entry. I had been struggling with a problem similar to this and didn't know just how to proceed next.

Comments have been disabled for this content.