Typed XmlReaders: bridging the gap between streaming and object model APIs.
Note: this entry has moved.
When dealing with XML in .NET, you're mostly faced with two options:
- Streaming API: the XmlReader.
- Object model API: either XmlDocument, XPathDocument or an XmlSerializer-aware custom object model.
Several reasons can lean you towards any of the later ones, such as strong typing (XmlSerializer), flexibility and XPath querying (XmlDocument and XPathDocument), etc. Any of the three object model API approaches, however, require the entire XML input to be parsed and loaded to memory. Therefore, when you're presented with large documents, or need the fastest processing, all you're left with is the XmlReader. If you worked with it doing anything but the most trivial XML processing, you know how ugly it can become. Lots of string comparison, endless switch, if, loops, whatever.
From my point of view, working against a custom object model is best, as it gives you a level of abstraction from the wire format, and you get to work with OO classes and properties, which is far more comfortable than dealing with InnerXml, Value, etc. If you haven't tried the XmlSerializer approach before, you definitely should.
When you move to streaming processing, you lose all that. And you don't
lose it because the abstractions of your entities have disappeared, as you most
probably have an XML Schema defining what the XML must look like. You just lose
it because of the API. You can still use the XML Schema to validate as you
read, and get some (very little) extra functionality from the XmlValidatingReader.ReadTypedValue()
method. If you're like me, you may be asking: given that I know the schema at
design time, isn't there a way to use it to make things easier for me?
And that's not the only issue. Validating against an XML Schema, even if it's
absolutely a really good idea to keep your application data consistent and
considerably reduce your own validation code, is not for free. According to
tests I've done with the (fairly simple) purchase order schema and
instance document in XML Schema Part 0:
Primer, XmlValidatingReader
is between 10X and 12X slower
than the XmlTextReader
. Not that this is a bad number, just that you
need to have that in mind. And why is it so costly? Well, mostly because it's a
generic XML Schema validator, which means as it parses, it checks valid
transition between states, data types, facets, etc. And again, given that
I know the schema at design time, isn't there a way to use it to make things
easier for the parser?
Typed readers
Just as typed datasets build upon the generic DataSet to bring strong-typing and
validation to the game, based on an XML Schema, wouldn't it be great if the
same existed for readers?
A typed reader should be built upon the XmlReader and provide the same
validation capabilities as XmlValidatingReader, but at a fraction of the cost,
because it would already know all the elements, attributes and types, and it
would also be able to read and validate an specific schema.
Given a purchase order document, I could write code as follows:
Maybe it should be something more like this:
I sort of prefer the later. The TypedReader
property would contain
the instance used to read (and validate) the current element content model,
which would be the current
strategy being applied. With the advent of generics, maybe I should
even be allowed to pass the typed reader I want...
I guess in Whidbey that would be way to implement it internally, anyways....
Another possible use is dynamic run-time generation of these typed readers for a schema. If we can prove that performance will increase, we could use the typed readers not to gain usability but to gain speed. This could be a specialized factory that emits the code (the same your would get at design time) to execute:
The factory itself would keep cached versions of the Types it has already generated from a certain schema...
So, what do you think about such an idea? Is it useful? Would you use it? What should the API look like?
This may be part of the new Mvp.Xml project most XML MVPs (including me, of course) are heading.