XmlSerializer and XSD Type Inheritance: does it work?
Note: this entry has moved.
One of the most powerful XML Schema features is its hability
to validate documents based on element types, instead of
element names. That is no matter which element name is used
in an instance document, say Customer,
customer and CRMCustomer, as far
as our XSD Schema makes them all inherit from say
CustomerDef, the document will be valid. This
is very important in interoperability scenarios, of course.
That said, one of the most versatile and performat ways to
handle XML in .NET (forget about XmlDocument)
is using the XmlSerializer class. Coupled with
XSD.EXE or the technique I exposed
in a previous post, you can easily autogenerate the classes from that schema.
So far so good.
However, only one of the three versions will work, the one
with the element "customer". The other versions, which are
equally valid according to the schema, and which are of the
desired type CustomerDef will fail with an
exception saying the element was not expected. As I
explained while
discussing XmlSerializer speed, it creates a temporary assembly for reading and writing
the serialized version of a type. We're interested in the
reader now.
When the XSD shown above is used to generate the
XmlSerializable class, we get a class definition like the
following:
/// <remarks/>
[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://www.lagash.com/schemas/customers")]
[System.Xml.Serialization.XmlRootAttribute("customer", Namespace="http://www.lagash.com/schemas/customers", IsNullable=false)]
public class CustomerDef
{
/// <remarks/>
public string Name;
/// <remarks/>
public string EMail;
}
From the definition above, the
XmlSerializer will create the temporary reader.
The reader will contain a set of Read methods
according to those serialization attributes. Using the
technique explained in the
aforemetioned post, I got the temporary generated class. The reader contains
a Read4_customer
method which is the one that tries to load the XML. The
problem is that this method uses a stored string (taken from
the serialization attributes) and performs an element
name/namespace string comparison. Therefore, it will always
fail with the other two valid root elements.
I found a very interesting thing though, while digging
inside the generated reader. It has a method with the
signature
CustomerDef Read1_CustomerDef(bool isNullable, bool
checkType)
which is perfectly capable of loading the object. However,
getting this far was very difficult. First, I had to add
this temporary class to my project and make that method
public, as it's private by default, and second, there's no
"public" way of initializing this reader. You have to call
an internal Init method on the base
XmlSerializationReader
class. Thanks GOD we still have reflection to test these
things!
MethodInfo m = typeof(XmlSerializationReader).GetMethod(
"Init", BindingFlags.Instance | BindingFlags.NonPublic);
using (FileStream c = new FileStream(@"C:\CustomerCRM.xml", FileMode.Open))
{
XmlValidatingReader vr = new XmlValidatingReader(new XmlTextReader(c));
vr.Schemas.Add(xsd);
// Create the temp. reader manually
Microsoft.Xml.Serialization.GeneratedAssembly.CustomerDefReader cr =
new Microsoft.Xml.Serialization.GeneratedAssembly.CustomerDefReader();
// Call Init through reflection
m.Invoke(cr, new object[] { vr, null, null, null } );
// Read with the method that checks the type
object cust = cr.Read1_CustomerDef( false, false );
Console.WriteLine(cust);
}
That method will sucessfully load any of the three versions
for the root element, either if they have the
xsi:type attribute set, in which case the
Read1_CustomerDef could use a
true for the second parameter (checkType), or
not. Another method that is generated and could work is
Read2_Object, if it receives
checkType=true and the instance document uses
xsi:type to specify that it's a
CustomerDef instance (which is not always
possible if you're receiving the document from a third
party). Unfortunately, like I said above, the code that
calls Read1_CustomerDef, and which is the one
called by the serializer to load the XML, only checks for
names:
public object Read4_customer() {
object o = null;
Reader.MoveToContent();
if (Reader.NodeType == System.Xml.XmlNodeType.Element) {
if (((object) Reader.LocalName == (object)id5_customer &&
(object) Reader.NamespaceURI == (object)id2_httpwwwlagashcomschemascustomers)) {
o = Read1_CustomerDef(false, true);
}
else {
throw CreateUnknownNodeException();
}
}
else {
UnknownNode(null);
}
return (object)o;
}
Note the very efficient use of string reference comparison,
by casting them to Object.
One way to solve this would be if the
XmlRootAttribute could be specified multiple
times, so that the generated code checks for multiple names.
The other, more XSD-compliant and certainly more flexible as
it wouldn't require regeneration of the serializable class
(CustomerDef in this case) to reflect new element names,
would be to check if the current Reader is
actually an XmlValidatingReader and read the
customer if the type matches. The previous code can be
modified as follows to make this work:
public object Read4_customer() {
object o = null;
Reader.MoveToContent();
if (Reader.NodeType == System.Xml.XmlNodeType.Element) {
// Check for validating reader with schema type determined
if (Reader is System.Xml.XmlValidatingReader &&
((System.Xml.XmlValidatingReader) Reader).SchemaType != null) {
System.Xml.Schema.XmlSchemaType type = (System.Xml.Schema.XmlSchemaType)
((System.Xml.XmlValidatingReader) Reader).SchemaType;
// We would have to check the inheritance chain too.
if (((object) type.Name == (object)id1_CustomerDef &&
(object) type.QualifiedName.Namespace == (object)id2_httpwwwlagashcomschemascustomers)) {
o = Read1_CustomerDef(false, true);
}
}
else if (((object) Reader.LocalName == (object)id5_customer &&
(object) Reader.NamespaceURI == (object)id2_httpwwwlagashcomschemascustomers)) {
o = Read1_CustomerDef(false, true);
}
else {
throw CreateUnknownNodeException();
}
}
else {
UnknownNode(null);
}
return (object)o;
}
Of course this would require a validating reader with the appropriate schema loaded, by why would you create an XSD otherwise? Would you loose all those powerful validation capabilities and instead use it only to save you some lines of class definition code and Xml serialization attributes? If you do, I urge you to think twice, you're really missing something that can greatly improve your code (no more validation of ranges, patterns, etc.).
Another interesting thing I found is that the
XmlSerializer can be inherited, and it has a
protected method that allows you to deserialize directly
from an XmlSerializationReader. That would have
solve my previous reflection problems, as I could simply
inherit the serializer and make a public method receiving my
modified reader and pass it through to it. That would even
make for maybe more efficient custom deserialization, for
example one where the temporary assemblies are not generated
each time the application starts but rather stored in a
permanent location for reuse across AppDomains (maybe a
DB?). Remember the initial generation and compilation
performance hit is significant. Unfortunately, that method
implementation throws a NotImplementedException
:((((((. But that makes me wonder if in Whidbey it's
possible.... :)
In a future post, maybe here, maybe on MSDN online, I will
explain how to take advantage of
IXmlSerializable interface to implement custom
serialization but also to gain automatic XSD validation
right from your assembly-embedded schema.