XmlSerializer and XSD Type Inheritance: does it work?

Note: this entry has moved.

One of the most powerful XML Schema features is its hability to validate documents based on element types, instead of element names. That is no matter which element name is used in an instance document, say Customer, customer and CRMCustomer, as far as our XSD Schema makes them all inherit from say CustomerDef, the document will be valid. This is very important in interoperability scenarios, of course.
That said, one of the most versatile and performat ways to handle XML in .NET (forget about XmlDocument) is using the XmlSerializer class. Coupled with XSD.EXE or the technique I exposed in a previous post, you can easily autogenerate the classes from that schema. So far so good.

+ For the curious, here's such a schema (a trivial one here of couse).

+ And here are the different instance documents.

However, only one of the three versions will work, the one with the element "customer". The other versions, which are equally valid according to the schema, and which are of the desired type CustomerDef will fail with an exception saying the element was not expected. As I explained while discussing XmlSerializer speed, it creates a temporary assembly for reading and writing the serialized version of a type. We're interested in the reader now.
When the XSD shown above is used to generate the XmlSerializable class, we get a class definition like the following:

/// <remarks/>
[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://www.lagash.com/schemas/customers")]
[System.Xml.Serialization.XmlRootAttribute("customer", Namespace="http://www.lagash.com/schemas/customers", IsNullable=false)]
public class CustomerDef 
{
  /// <remarks/>
  public string Name;
  
  /// <remarks/>
  public string EMail;
}

From the definition above, the XmlSerializer will create the temporary reader. The reader will contain a set of Read methods according to those serialization attributes. Using the technique explained in the aforemetioned post, I got the temporary generated class. The reader contains a Read4_customer method which is the one that tries to load the XML. The problem is that this method uses a stored string (taken from the serialization attributes) and performs an element name/namespace string comparison. Therefore, it will always fail with the other two valid root elements.

I found a very interesting thing though, while digging inside the generated reader. It has a method with the signature CustomerDef Read1_CustomerDef(bool isNullable, bool checkType) which is perfectly capable of loading the object. However, getting this far was very difficult. First, I had to add this temporary class to my project and make that method public, as it's private by default, and second, there's no "public" way of initializing this reader. You have to call an internal Init method on the base XmlSerializationReader class. Thanks GOD we still have reflection to test these things!

MethodInfo m = typeof(XmlSerializationReader).GetMethod(
  "Init", BindingFlags.Instance | BindingFlags.NonPublic);

using (FileStream c = new FileStream(@"C:\CustomerCRM.xml", FileMode.Open))
{
  XmlValidatingReader vr = new XmlValidatingReader(new XmlTextReader(c));
  vr.Schemas.Add(xsd);
  
  // Create the temp. reader manually
  Microsoft.Xml.Serialization.GeneratedAssembly.CustomerDefReader cr = 
    new Microsoft.Xml.Serialization.GeneratedAssembly.CustomerDefReader();
  
  // Call Init through reflection
  m.Invoke(cr, new object[] { vr, null, null, null } );
  
  // Read with the method that checks the type
  object cust = cr.Read1_CustomerDef( false, false );
  Console.WriteLine(cust);
}

That method will sucessfully load any of the three versions for the root element, either if they have the xsi:type attribute set, in which case the Read1_CustomerDef could use a true for the second parameter (checkType), or not. Another method that is generated and could work is Read2_Object, if it receives checkType=true and the instance document uses xsi:type to specify that it's a CustomerDef instance (which is not always possible if you're receiving the document from a third party). Unfortunately, like I said above, the code that calls Read1_CustomerDef, and which is the one called by the serializer to load the XML, only checks for names:

public object Read4_customer() {
    object o = null;
    Reader.MoveToContent();
    if (Reader.NodeType == System.Xml.XmlNodeType.Element) {
        if (((object) Reader.LocalName == (object)id5_customer && 
            (object) Reader.NamespaceURI == (object)id2_httpwwwlagashcomschemascustomers)) {
            o = Read1_CustomerDef(false, true);
        }
        else {
            throw CreateUnknownNodeException();
        }
    }
    else {
        UnknownNode(null);
    }
    return (object)o;
}

Note the very efficient use of string reference comparison, by casting them to Object.

One way to solve this would be if the XmlRootAttribute could be specified multiple times, so that the generated code checks for multiple names.
The other, more XSD-compliant and certainly more flexible as it wouldn't require regeneration of the serializable class (CustomerDef in this case) to reflect new element names, would be to check if the current Reader is actually an XmlValidatingReader and read the customer if the type matches. The previous code can be modified as follows to make this work:

public object Read4_customer() {
  object o = null;
  Reader.MoveToContent();
  if (Reader.NodeType == System.Xml.XmlNodeType.Element) {
    // Check for validating reader with schema type determined
    if (Reader is System.Xml.XmlValidatingReader && 
      ((System.Xml.XmlValidatingReader) Reader).SchemaType != null) {
      System.Xml.Schema.XmlSchemaType type = (System.Xml.Schema.XmlSchemaType)
        ((System.Xml.XmlValidatingReader) Reader).SchemaType;
      // We would have to check the inheritance chain too.
      if (((object) type.Name == (object)id1_CustomerDef && 
        (object) type.QualifiedName.Namespace == (object)id2_httpwwwlagashcomschemascustomers)) {
        o = Read1_CustomerDef(false, true);
      }
    }
    else if (((object) Reader.LocalName == (object)id5_customer && 
      (object) Reader.NamespaceURI == (object)id2_httpwwwlagashcomschemascustomers)) {
      o = Read1_CustomerDef(false, true);
    }
    else {
      throw CreateUnknownNodeException();
    }
  }
  else {
    UnknownNode(null);
  }
  return (object)o;
}

Of course this would require a validating reader with the appropriate schema loaded, by why would you create an XSD otherwise? Would you loose all those powerful validation capabilities and instead use it only to save you some lines of class definition code and Xml serialization attributes? If you do, I urge you to think twice, you're really missing something that can greatly improve your code (no more validation of ranges, patterns, etc.).

Another interesting thing I found is that the XmlSerializer can be inherited, and it has a protected method that allows you to deserialize directly from an XmlSerializationReader. That would have solve my previous reflection problems, as I could simply inherit the serializer and make a public method receiving my modified reader and pass it through to it. That would even make for maybe more efficient custom deserialization, for example one where the temporary assemblies are not generated each time the application starts but rather stored in a permanent location for reuse across AppDomains (maybe a DB?). Remember the initial generation and compilation performance hit is significant. Unfortunately, that method implementation throws a NotImplementedException :((((((. But that makes me wonder if in Whidbey it's possible.... :)

In a future post, maybe here, maybe on MSDN online, I will explain how to take advantage of IXmlSerializable interface to implement custom serialization but also to gain automatic XSD validation right from your assembly-embedded schema.

+ For the curious, the complete XmlSerializer-generated file for the schema.

No Comments