Typed XmlReaders: bridging the gap between streaming and object model APIs.
Note: this entry has moved.
When dealing with XML in .NET, you're mostly faced with two options:
- Streaming API: the XmlReader.
- Object model API: either XmlDocument, XPathDocument or an XmlSerializer-aware custom object model.
Several reasons can lean you towards any of the later ones, such as strong typing (XmlSerializer), flexibility and XPath querying (XmlDocument and XPathDocument), etc. Any of the three object model API approaches, however, require the entire XML input to be parsed and loaded to memory. Therefore, when you're presented with large documents, or need the fastest processing, all you're left with is the XmlReader. If you worked with it doing anything but the most trivial XML processing, you know how ugly it can become. Lots of string comparison, endless switch, if, loops, whatever.
From my point of view, working against a custom object model is best, as it gives you a level of abstraction from the wire format, and you get to work with OO classes and properties, which is far more comfortable than dealing with InnerXml, Value, etc. If you haven't tried the XmlSerializer approach before, you definitely should.
When you move to streaming processing, you lose all that.
And you don't lose it because the abstractions of your
entities have disappeared, as you most probably have an XML
Schema defining what the XML must look like. You just lose
it because of the API. You can still use the XML Schema to
validate as you read, and get some (very little) extra
functionality from the
XmlValidatingReader.ReadTypedValue() method. If
you're like me, you may be asking: given that I know the
schema at design time, isn't there a way to use it to make
things easier for me?
And that's not the only issue. Validating against an XML
Schema, even if it's absolutely a really good idea to keep
your application data consistent and considerably reduce
your own validation code, is not for free. According to
tests I've done with the (fairly simple) purchase order
schema and instance document in
XML Schema Part 0: Primer, XmlValidatingReader is between 10X and 12X
slower than the XmlTextReader. Not that this is
a bad number, just that you need to have that in mind. And
why is it so costly? Well, mostly because it's a generic XML
Schema validator, which means as it parses, it checks valid
transition between states, data types, facets, etc. And
again, given that I know the schema at design time, isn't
there a way to use it to make things
easier for the parser?
Typed readers
Just as typed datasets build upon the generic DataSet to
bring strong-typing and validation to the game, based on an
XML Schema, wouldn't it be great if the same existed for
readers?
A typed reader should be built upon the XmlReader
and provide the same validation capabilities as
XmlValidatingReader, but at a fraction of the cost, because
it would already know all the elements, attributes and
types, and it would also be able to read and validate an
specific schema.
Given a purchase order document, I could write code as follows:
Maybe it should be something more like this:
I sort of prefer the later. The
TypedReader property would contain the instance
used to read (and validate) the current element content
model, which would be the current
strategy
being applied. With the advent of generics, maybe I should
even be allowed to pass the typed reader I want...
I guess in Whidbey that would be way to implement it internally, anyways....
Another possible use is dynamic run-time generation of these typed readers for a schema. If we can prove that performance will increase, we could use the typed readers not to gain usability but to gain speed. This could be a specialized factory that emits the code (the same your would get at design time) to execute:
The factory itself would keep cached versions of the Types it has already generated from a certain schema...
So, what do you think about such an idea? Is it useful? Would you use it? What should the API look like?
This may be part of the new Mvp.Xml project most XML MVPs (including me, of course) are heading.