XML extensibility, xsi:type, XmlSerializer and configuration (or how to leverage XmlSerializer + OO extensibility)

Friday, January 23, 2004

.NET XML

Note: this entry has moved.

Consider the following XML:

<Person> <FirstName>Daniel</FirstName> <LastName>Cazzulino</LastName> </Person>

We can use the XmlSerializer to reconstruct an instance of the following class from it:

public class Person { public string FirstName { get { return _first; } set { _first = value; } } string _first; public string LastName { get { return _last; } set { _last = value; } } string _last; }

So far so good. Now, if we use the XmlSerializer to serialize an instance of Person to Xml, you'd be "surprised" to get the following XML (declaration aside):

<Person xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <FirstName>Daniel</FirstName> <LastName>Cazzulino</LastName> </Person>

Now, those namespace declaration weren't there in the original XML! The only way to get rid of them is to create a custom XmlTextWriter to use with the XmlSerializer:

public class NonXsiTextWriter : XmlTextWriter { public NonXsiTextWriter( TextWriter w ) : base( w ) {} public NonXsiTextWriter( Stream w, Encoding encoding ) : base( w, encoding ) {} public NonXsiTextWriter( string filename, Encoding encoding ) : base( filename, encoding ) {} bool _skip = false; public override void WriteStartAttribute( string prefix, string localName, string ns ) { if ( prefix == "xmlns" && ( localName == "xsd" || localName == "xsi" ) ) // Omits XSD and XSI declarations. { _skip = true; return; } base.WriteStartAttribute( prefix, localName, ns ); } public override void WriteString( string text ) { if ( _skip ) return; base.WriteString( text ); } public override void WriteEndAttribute() { if ( _skip ) { // Reset the flag, so we keep writing. _skip = false; return; } base.WriteEndAttribute(); } }

The skip flag works because everytime an attribute is being written, the three methods are called in a sequence: WriteStartAttribute, WriteString and WriteEndAttribute. Now, our writer will omit the xsd and xsi namespace declarations and preserve full fidelity with regards to the original XML. We just need to pass our writer to the XmlSerializer:

StringWriter sw = new StringWriter(); ser.Serialize( new NonXsiTextWriter( sw ), person ); Console.WriteLine( sw.ToString() );

Update: there's another way to achieve this namespace declarations omissions, as pointed by Jiho Han. But we will still need the specialized writer below.

Let's go a bit further and say we have an Employee class that inherits from Person:

public class Employee : Person { public string EmployeeID { get { return _id; } set { _id = value; } } string _id; }

This is only natural in most OO apps. Now, serializing an instance of Employee will result in the following XML (using our NonXsiTextWriter):

<Employee> <FirstName>Daniel</FirstName> <LastName>Cazzulino</LastName> <EmployeeID>1234</EmployeeID> </Employee>

Well, Houston, we have a problem. Even when Employee inherits from Person, the XmlSerializer will no longer be able to deserialize this XML into a Person object, because it expects a root <Person> element. So, what we can do is make the Employee class expect/render the root element of a Person object:

[XmlRoot("Person")] public class Employee : Person

The XmlSerializer will be able to deserialize the following XML into a Person or an Employee, depending on the Type passed to its ctor:

<Person> <FirstName>Daniel</FirstName> <LastName>Cazzulino</LastName> <EmployeeID>1234</EmployeeID> </Person>

So far so good. Now, if we have an XML containing a bunch of <Person> elements, let's say <People>, and its corresponding class:

public class People { [XmlElement("Person", typeof(Person))] public Person[] AllPeople { get { return _people; } set { _people = value; } } Person[] _people; }

The XmlSerializer will be perfectly capable of deserializing the following XML:

<People> <Person> <FirstName>Daniel</FirstName> <LastName>Cazzulino</LastName> </Person> <Person> <FirstName>Victor</FirstName> <LastName>Garcia Aprea</LastName> </Person> </People>

And it will be able to generate exactly the same document from the following object:

People p = new People(); p.AllPeople = new Person[] { new Person("Daniel", "Cazzulino"), new Person("Victor", "Garcia Aprea") };

Now, being an Employee a Person, we may want to populate the People class with them too, let's say VGA becomes and Employee while I remain an independent person:

People p = new People(); p.AllPeople = new Person[] { new Person("Daniel", "Cazzulino"), new Employee("Victor", "Garcia Aprea", "9999") };

The XmlSerializer will no longer know how to serialize the People type unless we tell it to expect an Employee too. The exception will say something like "Use the XmlInclude or SoapInclude attribute to specify types that are not known statically.". Adding these attributes to the base class, pointing to derived classes is not a good idea. Furthermore, new derived classes may appear that we may not know ahead of time. So, instead of adding those attributes, we can just pass the additional types to the XmlSerializer ctor:

XmlSerializer ser = new XmlSerializer( typeof( People ), new Type[] { typeof( Employee ) } );

Cool. Now the serializer will be able to serialize a People object containing a mix of Person and Employee instances. However, the XML will not look like what you expected:

<People> <Person> <FirstName>Daniel</FirstName> <LastName>Cazzulino</LastName> </Person> <Person d2p1:type="Employee" xmlns:d2p1="http://www.w3.org/2001/XMLSchema-instance"> <FirstName>Victor</FirstName> <LastName>Garcia Aprea</LastName> <EmployeeID>9999</EmployeeID> </Person> </People>

What's more, we have effectively broken the deserialization of regular People now, in the sense that a piece of code that only knows how to deal with Person objects, wishing to deserialize the XML with an XmlSerializer constructed without the Employee extra type parameter will fail with the following error:

"The specified type was not recognized: name='Employee', namespace='', at <Person xmlns=''>"

So, even though the XML containing the extra data for an Employee could be successuly deserialized into a Person, as we saw above, the type="Employee" attribute is breaking it. Note that the namespace prefix is "d2p1" instead of the regular "xsi" because I used the NonXsiTextWriter, which prevented the namespace to be mapped to "xsi" and the root element. Therefore, a "random" new prefix is being created.
What we need is a way to completely avoid emitting the xsi:type attribute. We can further modify the NonXsiTextWriter to skip all "xsi" attributes it finds as they're being written:

public override void WriteStartAttribute( string prefix, string localName, string ns ) { if ( ( prefix == "xmlns" && (localName == "xsd" || localName == "xsi")) || // Omits XSD and XSI declarations. ns == XmlSchema.InstanceNamespace ) // Omits all XSI attributes. { _skip = true; return; } base.WriteStartAttribute( prefix, localName, ns ); }

With the new check for ns == XmlSchema.InstanceNamespace we're effectively bypassing the attribute writing. Now, the part of the program that works against Person instances can simply deserialize the People class without knowing there is Employee data also, or that there is an Employee class altogether. It will simply be ignored by the XmlSerializer. This is specially useful in configuration scenarios, where there may be extensible points like providers that need to be handled generically by your custom configuration handler, but need to be instanciated and initialized with custom configuration. Your generic provider class could simply specify the Type as an attribute, and the custom handler would deserialize the entire node into the Provider-derived class:

object IConfigurationSectionHandler.Create( object parent, object configContext, XmlNode section) { XmlSerializer ser = new XmlSerializer(typeof(MyConfigurationWithProviders)); MyConfigurationWithProviders cfg = (MyConfigurationWithProviders) ser.Deserialize( new XmlNodeReader( section ) ); // Iterate providers. XmlNodeList providers = section.SelectNodes("Provider"); foreach (XmlNode p in providers) { Type t = Type.GetType( p.Attributes["Type"].Value ); XmlSerializer ps = new XmlSerializer( t ); object instance = ps.Deserialize( new XmlNodeReader ( p ) ); // We have a derived type fully initialized!! } }

Our Provider class would be all too simple:

public class Provider { [XmlAttribute] public string Type { get { return _type; } set { _type = value; } } string _type; }

Afterwards, a derived provider, for example a DbStorageProvider, would be:

[XmlRoot("Provider")] public class DbStorageProvider : Provider { public int Timeout { get { return _timeout; } set { _timeout = value; } } string _timeout; }

I'm sure you appreciate the power and flexibility of this approach. You no longer need to worry about "parsing" the XmlNode in search for your properties, load them, etc. You can just rely on the generic XmlSerializer-based configuration handler above. The configuration for the provider can be as complex as you like, and it integrates well with the base functionality of the handler:

But you not only gain loading ease of use, you can also modify your provider instance and serialize back to the configuration file, preserving the format thanks to the NonXsiTextWriter we wrote. Now you can build a flexible configuration API based on the XmlSerializer features, letting developers programmatically configure your application. For example, let's say some admin interface allows adding new providers. The code could do something like the following:

AnotherProvider ap = new AnotherProvider(); // Set all properties // Pass to configuration API MyConfig cfg = (MyConfig) ConfigurationSection.GetSection( "MyCoolSection" ); cfg.Providers.Add( ap ); // Save passing the extra types array to use with the XmlSerializer. MyConfigManager.Save( cfg, new Type[] { typeof (AnotherProvider) } );

I believe this is a far more straightforward way of handling extensible configuration. Instead of implementing a sort of IProvider.Init(XmlNode config) feature, providers only need to care about the serialization format they want. I've seen that in many places in ASP.NET 2, providers receive some kind of NameValueCollection. This is clearly a step in the wrong direction. Complex configuration simply can't be handled by key-value pairs (or it's too ugly/cumbersome to do so). Imaging a provider with lots of attributes because that's the only config. supported... ugh..

Very cool stuff. Looks like I'll have to reconsider how I've implemented some of my provider stuff.

Steve - Friday, January 23, 2004 3:44:00 PM

Thanks Jiho. I've been looking for that info before!

Anyway, I still need the custom writer because of the xsi:type attribute.

Section handlers have the advantage that in web scenarios the appdomain will be automatically recycled, so you don't need to handle the complexity of updating config for a number of components that may have already taken config for its internal state.

You can also use custom section handlers in web.config, no need to stick to name-value stuff.

For add-ons, I'd still use web.config for the main "add-on loader". Each add-on should use a config after the add-on assembly (i.e. MyAddon.dll and MyAddon.config) and your loaded can use the same mechanism as .NET. In this case, I'd use a file format and inner working that exactly mimics what web.config does, to minimize impact on the developer. They would use a class you provide similar to ConfigurationSettings, that loads config from there, initializes the section handlers and so on.

Thanks for your feeback across my weblog!

Daniel Cazzulino - Friday, April 30, 2004 1:03:00 PM

Hi Jiho, thanks for your comments and I'm glad you find my blog useful!

Most addons will use configuration the first time they are accessed to initialize themselves, for example opening ports, loading some information from somewhere, and so on. When reconfiguration is performed, you would need to notify everyone that a change happened, and each of these components would need to refresh their "internal state" taking into account the new values.

Using web.config and section handlers, every modification causes an application restart (AppDomain recycling). Therefore, all components are given a fresh start again: ergo, you don't need to care about reconfiguration at all!

So, having the main addon-loader configurable through the web.config has that benefit. In your loader (let's say it receives a path where to look for addons), you load each assembly and provide it with a way to retrieve settings from a file named after the assembly name (so that each addon has its own file).

This file should use the same .NET syntax. In your class you load a custom section handler configured by the addon author on its own config file, and call IConfigurationSectionSectionHandler.Create() to load config, and return it.

If at any point the web.config is touched, all components just go away, so you don't need to care about monitoring those config files.

BTW, you may want to take a look at the Configuration Application Block from PAG ;)

Daniel Cazzulino - Tuesday, May 11, 2004 5:33:00 PM

Thanks Daniel.

It looks like CMAB is the best route. It implements everything that I am looking for and it also uses XmlSerializer - the approach you mention in this blog -.

I'll let you know how the implementation went once I'm done.

Thanks again!

Jiho Han - Wednesday, May 12, 2004 5:53:00 PM

Daniel,

While implementing the provider model myself, I read through the two articles by Rob Howard on the recently published Nothing but ASP.NET column. In the part 2 article, the implementation for ASP.NET 1.1 creates and stores the constructor of the provider rather than the provider object itself, which I'd think to do myself.

Do you see why they might have done that?

I don't see the point of creating the Provider class to hold the name, type, and attributes either. Why not create the real thing instead?

Thanks

Jiho Han - Wednesday, May 12, 2004 9:21:00 PM

nice idea, helped a lot ;)

Andy - Tuesday, August 28, 2007 5:05:57 PM

6 Comments