Reading XML fragments with XmlTextReader - XmlFragmentStream
Note: this entry has moved.
Back at the
2004 MVP Global Summit, I met fellow XML fan
Kirk, who was
seeking a solution to the following problem: you have a
(several) megabytes file containing multiple XML fragments,
and you want to read it (in
his case
specially through the
SgmlReader). The problem is, of course, that the
XmlTextReader will throw an exception as soon
as it finds the second fragment, unless you use the special
ctor. overload that takes an
XmlParsingContext.
Dare
shows
an alternate solution based on XML inclusion techniques,
either DTD external entities or
XInclude.
These techniques effectively expose a fully well-formed
document to your application, which has a number of
benefits, including the ability to transform it if you need
to, for example. But I was thinking more along the lines of
providing a class that could actually read the fragments
without resorting to those mechanisms. I couldn't cheat the
XmlTextReader, so I decided to go one step
lower. The result is the XmlFragmentStream, a
class that wraps any System.IO.Stream and fakes
the missing root element, so that an
XmlTextReader layered on top of it, will think
the document is well-formed. Here's how to use it:
Given the following XML fragments:
...
You can read (and even validate with an
XmlValidatingReader) using this code:
{
XmlTextReader tr = new XmlTextReader(new XmlFragmentStream(stm));
// Do performant ref comparison
string ev = tr.NameTable.Add("event");
while (tr.Read())
{
if (tr.LocalName == ev)
// Process it!
}
}
The XmlFragmentStream class also contain two
contructor overloads that allow you to specify the name and
namespace of the enclosing root element (by default
<root>):
public XmlFragmentStream(Stream innerStream, string rootName)
public XmlFragmentStream(Stream innerStream, string rootName, string namespaceURI)
This technique is proven by a real world (surely happy)
customer Kirk helped ;). What's more, he even contributed a
bug-fix he found when using it.
The performance impact
of this approach in negligible because the class is
basically an intermediary with minimal processing.
As Oleg noted pointed in a comment (and motivated a slight
editing in this post), as well as
showed in his weblog, you can do this with the aforementioned special
XmlTextReader constructor overload, passing an
XmlParsingContext. This is more cumbersome, in
my opinion, and still leaves you with the problem of not
having a valid XML document.
The full Mvp.Xml project source code can be downloaded from SourceForge.
Enjoy and please give us feedback on the project!
Check out the Roadmap to high performance XML.