XML extensibility, xsi:type, XmlSerializer and configuration (or how to leverage XmlSerializer + OO extensibility)
Note: this entry has moved.
Consider the following XML:
We can use the XmlSerializer to reconstruct an instance of the following class from it:
So far so good. Now, if we use the XmlSerializer to serialize an instance of Person to Xml, you'd be "surprised" to get the following XML (declaration aside):
Now, those namespace declaration weren't there in the original XML! The only way to get rid of them is to create a custom XmlTextWriter to use with the XmlSerializer:
The skip flag works because everytime an attribute is being written, the three methods are called in a sequence: WriteStartAttribute, WriteString and WriteEndAttribute. Now, our writer will omit the xsd and xsi namespace declarations and preserve full fidelity with regards to the original XML. We just need to pass our writer to the XmlSerializer:
Update: there's another way to achieve this namespace declarations omissions, as pointed by Jiho Han. But we will still need the specialized writer below.
Let's go a bit further and say we have an Employee class that inherits from Person:
This is only natural in most OO apps. Now, serializing an instance of Employee will result in the following XML (using our NonXsiTextWriter):
Well, Houston, we have a problem. Even when Employee inherits from Person, the XmlSerializer will no longer be able to deserialize this XML into a Person object, because it expects a root <Person> element. So, what we can do is make the Employee class expect/render the root element of a Person object:
The XmlSerializer will be able to deserialize the following XML into a Person or an Employee, depending on the Type passed to its ctor:
So far so good. Now, if we have an XML containing a bunch of <Person> elements, let's say <People>, and its corresponding class:
The XmlSerializer will be perfectly capable of deserializing the following XML:
And it will be able to generate exactly the same document from the following object:
Now, being an Employee a Person, we may want to populate the People class with them too, let's say VGA becomes and Employee while I remain an independent person:
The XmlSerializer will no longer know how to serialize the People type unless we tell it to expect an Employee too. The exception will say something like "Use the XmlInclude or SoapInclude attribute to specify types that are not known statically.". Adding these attributes to the base class, pointing to derived classes is not a good idea. Furthermore, new derived classes may appear that we may not know ahead of time. So, instead of adding those attributes, we can just pass the additional types to the XmlSerializer ctor:
Cool. Now the serializer will be able to serialize a People object containing a mix of Person and Employee instances. However, the XML will not look like what you expected:
What's more, we have effectively broken the deserialization of regular People now, in the sense that a piece of code that only knows how to deal with Person objects, wishing to deserialize the XML with an XmlSerializer constructed without the Employee extra type parameter will fail with the following error:
So, even though the XML containing the extra data for an Employee could be
successuly deserialized into a Person, as we saw above, the type="Employee"
attribute is breaking it. Note that the namespace prefix is "d2p1" instead
of the regular "xsi" because I used the NonXsiTextWriter, which
prevented the namespace to be mapped to "xsi" and the root element. Therefore,
a "random" new prefix is being created.
What we need is a way to completely avoid emitting the xsi:type attribute.
We can further modify the NonXsiTextWriter to skip all "xsi" attributes it
finds as they're being written:
With the new check for ns == XmlSchema.InstanceNamespace
we're
effectively bypassing the attribute writing. Now, the part of the program that
works against Person instances can simply deserialize the People class without
knowing there is Employee data also, or that there is an Employee class
altogether. It will simply be ignored by the XmlSerializer. This is specially
useful in configuration scenarios, where there may be extensible points like
providers that need to be handled generically by your custom
configuration handler, but need to be instanciated and initialized with custom
configuration. Your generic provider class could simply specify the Type
as an attribute, and the custom handler would deserialize the entire node into the
Provider-derived class:
Our Provider class would be all too simple:
Afterwards, a derived provider, for example a DbStorageProvider, would be:
I'm sure you appreciate the power and flexibility of this approach. You no longer need to worry about "parsing" the XmlNode in search for your properties, load them, etc. You can just rely on the generic XmlSerializer-based configuration handler above. The configuration for the provider can be as complex as you like, and it integrates well with the base functionality of the handler:
But you not only gain loading ease of use, you can also modify your provider instance and serialize back to the configuration file, preserving the format thanks to the NonXsiTextWriter we wrote. Now you can build a flexible configuration API based on the XmlSerializer features, letting developers programmatically configure your application. For example, let's say some admin interface allows adding new providers. The code could do something like the following:
I believe this is a far more straightforward way of handling extensible
configuration. Instead of implementing a sort of IProvider.Init(XmlNode
config)
feature, providers only need to care about the serialization
format they want. I've seen that in many places in ASP.NET 2, providers receive some kind of NameValueCollection. This is clearly a step in the wrong direction. Complex configuration simply can't be handled by key-value pairs (or it's too ugly/cumbersome to do so). Imaging a provider with lots of attributes because that's the only config. supported... ugh..