XmlSerializer and IXmlSerializable: automatic XSD validation please!

Note: this entry has moved.

The IXmlSerializable interface allows you to take full control of the XML serialization of your class whenever the XmlSerializer is used. It will check for this interface, and if it finds it, it will use its methods for the serialization, instead of reading the serialization attributes, etc. The interface has the following definition:

public interface IXmlSerializable
{
  XmlSchema GetSchema();
  void ReadXml(XmlReader reader);
  void WriteXml(XmlWriter writer);
}

Having the GetSchema method, you would think it performs automatic XSD validation of incoming XML, right? That the XmlReader passed to the ReadXml method is actually an instance of XmlValidatingReader created automatically by the serializer when it detects you implement IXmlSerializable, right? WRONG. It doesn't.

Basically the schema is retrieved at the initial temporary assembly generation step (explained in a previous post), for reasons I still couldn't find. But it's not used whatsoever after that. Why?
The XmlSerializer constructor takes an XmlReader, TextReader or Stream. It would be all too easy to wrap them with an XmlValidatingReader before deserializing the object. Think that the schema you have to return from the GetSchema method can be taken from an embedded resource...

I think this would be a great addition to Whidbey.

5 Comments

  • Don Box explained this in his session today at PDC. Basically, this is a performance hit that was not necessary given the proliferation of IXmlSerializable throughout the System.Data and System.Web.Services stacks. However, nothing prevents you from implementing your own validation within GetSchema, where you might add a SoapExtension to enhance web services through serialized body structures to validate based on schema content model.



    BTW - Great stuff... keep it up!

  • Kirk is right, this is all about the perf cost.

  • Kirk: just have a look at Reflector and try to find that "proliferation" of IXmlSerializable... only the damn Dataset implements it!! Not even WSE 2.0 uses it! And no, I don't want a SoapExtension because I want my class loading validated when it's loaded from a stream, from an MSMQ message, a mail message body, a socket, etc., etc..

    Dare: I don't think it's fair to make us pay the price for the "Dataset" perf cost (as it's the only one using it as far as I can see) if we wanted the validation automatically. Maybe a boolean parameter (performValidation) at XmlSerialization construction time would suffice?

    From what I see, the perf cost on the Dataset is that GetSchema causes the XSD to be generated each time... Or is it the validation itself? If it's the former, maybe the Dataset could be more "intelligent" and track changes to its structure (columns and tables) and invalidate a cacheable version of its own schema...

  • Validation during serialization scenarios is a not a feature most people want. Since we try to ensure that the many don't end up paying the cost of a feature wanted by a few it is off by default. This decision was made by the folks who own the XmlSerializer (used to be Doug Purdy, now it's Matt Tavis) and I understand their position.



    If you want validation on load you can wrap in a validating XML reader yourself, in Whidbey it is no longer limited to just taking an XmlTextReader as input.



  • Well, we're not talking about generic (de)serialization, but for an object for which I explicitly implemented IXmlSerializable.GetSchema() to return a schema. It looks quite natural to me as the class developer to think that it's going to be used, otherwise, why am I forced to implement that method? This is misleading. I could easily assume that validation is taking place where it's not, and it's not stated that way in the documentation either.

    Maybe there should be a separate IXmlValidable interface with that GetSchema method, and we would all be happy.

Comments have been disabled for this content.