Syndication

News

     

Archives

Miscelaneous

Programming

October 2003 - Posts

Note: this entry has moved.

Finally, RssBandit no longer freezes, it works flawlessly, is full featured and a real pleasure to use. It's my new default reader (goodbye SharpReader!!!) And Roy, I must say you're wrong. You need to have a look at the latest version. Well done Dare!!
Posted by Daniel Cazzulino
Filed under:

Note: this entry has moved.

The IXmlSerializable interface allows you to take full control of the XML serialization of your class whenever the XmlSerializer is used. It will check for this interface, and if it finds it, it will use its methods for the serialization, instead of reading the serialization attributes, etc. The interface has the following definition:

public interface IXmlSerializable
{
  XmlSchema GetSchema();
  void ReadXml(XmlReader reader);
  void WriteXml(XmlWriter writer);
}

Having the GetSchema method, you would think it performs automatic XSD validation of incoming XML, right? That the XmlReader passed to the ReadXml method is actually an instance of XmlValidatingReader created automatically by the serializer when it detects you implement IXmlSerializable, right? WRONG. It doesn't.

Basically the schema is retrieved at the initial temporary assembly generation step (explained in a previous post), for reasons I still couldn't find. But it's not used whatsoever after that. Why?
The XmlSerializer constructor takes an XmlReader, TextReader or Stream. It would be all too easy to wrap them with an XmlValidatingReader before deserializing the object. Think that the schema you have to return from the GetSchema method can be taken from an embedded resource...

I think this would be a great addition to Whidbey.

Posted by Daniel Cazzulino | 5 comment(s)
Filed under:

Note: this entry has moved.

I've reading about blogging formats, undergoing specs, and the like, and I eventually ended in Tim Bray weblog, and his multi-version schema for PIE/ECHO/ATOM/Whatever (PEAW). You can look at the RelaxNG Compact Syntax version as well as the XML Schema version.
Boy, 2Kb vs 6kb!! And the RelaxNG version is actually FAR more readable! And it uses the built-in XSD Types! Please, you NEED to have a look at both.

I've implemented the Schematron XML validation language for .NET... maybe I should start having a look at RelaxNG now :). However, I saw a RelaxNG folder in the Mono source, so it seems those guys are beating MS this time... or will we have a surprise in Whidbey?
As a developer, I'd love to have the choice in .NET to use one or the other, specially because XSD is not easy to tackle at first, and many developers completely ignore its capabilities, and design plain wrong schemas. RelaxNG looks so much simple that it would be quite easy to get up to speed designing complex schemas with it. And of course, there's a non-compact XML version too.
Posted by Daniel Cazzulino | 1 comment(s)
Filed under:

Note: this entry has moved.

One of the most powerful XML Schema features is its hability to validate documents based on element types, instead of element names. That is no matter which element name is used in an instance document, say Customer, customer and CRMCustomer, as far as our XSD Schema makes them all inherit from say CustomerDef, the document will be valid. This is very important in interoperability scenarios, of course.
That said, one of the most versatile and performat ways to handle XML in .NET (forget about XmlDocument) is using the XmlSerializer class. Coupled with XSD.EXE or the technique I exposed in a previous post, you can easily autogenerate the classes from that schema. So far so good.

+ For the curious, here's such a schema (a trivial one here of couse).

+ And here are the different instance documents.

However, only one of the three versions will work, the one with the element "customer". The other versions, which are equally valid according to the schema, and which are of the desired type CustomerDef will fail with an exception saying the element was not expected. As I explained while discussing XmlSerializer speed, it creates a temporary assembly for reading and writing the serialized version of a type. We're interested in the reader now.
When the XSD shown above is used to generate the XmlSerializable class, we get a class definition like the following:

/// <remarks/>
[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://www.lagash.com/schemas/customers")]
[System.Xml.Serialization.XmlRootAttribute("customer", Namespace="http://www.lagash.com/schemas/customers", IsNullable=false)]
public class CustomerDef 
{
  /// <remarks/>
  public string Name;
  
  /// <remarks/>
  public string EMail;
}

From the definition above, the XmlSerializer will create the temporary reader. The reader will contain a set of Read methods according to those serialization attributes. Using the technique explained in the aforemetioned post, I got the temporary generated class. The reader contains a Read4_customer method which is the one that tries to load the XML. The problem is that this method uses a stored string (taken from the serialization attributes) and performs an element name/namespace string comparison. Therefore, it will always fail with the other two valid root elements.

I found a very interesting thing though, while digging inside the generated reader. It has a method with the signature CustomerDef Read1_CustomerDef(bool isNullable, bool checkType) which is perfectly capable of loading the object. However, getting this far was very difficult. First, I had to add this temporary class to my project and make that method public, as it's private by default, and second, there's no "public" way of initializing this reader. You have to call an internal Init method on the base XmlSerializationReader class. Thanks GOD we still have reflection to test these things!

MethodInfo m = typeof(XmlSerializationReader).GetMethod(
  "Init", BindingFlags.Instance | BindingFlags.NonPublic);

using (FileStream c = new FileStream(@"C:\CustomerCRM.xml", FileMode.Open))
{
  XmlValidatingReader vr = new XmlValidatingReader(new XmlTextReader(c));
  vr.Schemas.Add(xsd);
  
  // Create the temp. reader manually
  Microsoft.Xml.Serialization.GeneratedAssembly.CustomerDefReader cr = 
    new Microsoft.Xml.Serialization.GeneratedAssembly.CustomerDefReader();
  
  // Call Init through reflection
  m.Invoke(cr, new object[] { vr, null, null, null } );
  
  // Read with the method that checks the type
  object cust = cr.Read1_CustomerDef( false, false );
  Console.WriteLine(cust);
}

That method will sucessfully load any of the three versions for the root element, either if they have the xsi:type attribute set, in which case the Read1_CustomerDef could use a true for the second parameter (checkType), or not. Another method that is generated and could work is Read2_Object, if it receives checkType=true and the instance document uses xsi:type to specify that it's a CustomerDef instance (which is not always possible if you're receiving the document from a third party). Unfortunately, like I said above, the code that calls Read1_CustomerDef, and which is the one called by the serializer to load the XML, only checks for names:

public object Read4_customer() {
    object o = null;
    Reader.MoveToContent();
    if (Reader.NodeType == System.Xml.XmlNodeType.Element) {
        if (((object) Reader.LocalName == (object)id5_customer && 
            (object) Reader.NamespaceURI == (object)id2_httpwwwlagashcomschemascustomers)) {
            o = Read1_CustomerDef(false, true);
        }
        else {
            throw CreateUnknownNodeException();
        }
    }
    else {
        UnknownNode(null);
    }
    return (object)o;
}

Note the very efficient use of string reference comparison, by casting them to Object.

One way to solve this would be if the XmlRootAttribute could be specified multiple times, so that the generated code checks for multiple names.
The other, more XSD-compliant and certainly more flexible as it wouldn't require regeneration of the serializable class (CustomerDef in this case) to reflect new element names, would be to check if the current Reader is actually an XmlValidatingReader and read the customer if the type matches. The previous code can be modified as follows to make this work:

public object Read4_customer() {
  object o = null;
  Reader.MoveToContent();
  if (Reader.NodeType == System.Xml.XmlNodeType.Element) {
    // Check for validating reader with schema type determined
    if (Reader is System.Xml.XmlValidatingReader && 
      ((System.Xml.XmlValidatingReader) Reader).SchemaType != null) {
      System.Xml.Schema.XmlSchemaType type = (System.Xml.Schema.XmlSchemaType)
        ((System.Xml.XmlValidatingReader) Reader).SchemaType;
      // We would have to check the inheritance chain too.
      if (((object) type.Name == (object)id1_CustomerDef && 
        (object) type.QualifiedName.Namespace == (object)id2_httpwwwlagashcomschemascustomers)) {
        o = Read1_CustomerDef(false, true);
      }
    }
    else if (((object) Reader.LocalName == (object)id5_customer && 
      (object) Reader.NamespaceURI == (object)id2_httpwwwlagashcomschemascustomers)) {
      o = Read1_CustomerDef(false, true);
    }
    else {
      throw CreateUnknownNodeException();
    }
  }
  else {
    UnknownNode(null);
  }
  return (object)o;
}

Of course this would require a validating reader with the appropriate schema loaded, by why would you create an XSD otherwise? Would you loose all those powerful validation capabilities and instead use it only to save you some lines of class definition code and Xml serialization attributes? If you do, I urge you to think twice, you're really missing something that can greatly improve your code (no more validation of ranges, patterns, etc.).

Another interesting thing I found is that the XmlSerializer can be inherited, and it has a protected method that allows you to deserialize directly from an XmlSerializationReader. That would have solve my previous reflection problems, as I could simply inherit the serializer and make a public method receiving my modified reader and pass it through to it. That would even make for maybe more efficient custom deserialization, for example one where the temporary assemblies are not generated each time the application starts but rather stored in a permanent location for reuse across AppDomains (maybe a DB?). Remember the initial generation and compilation performance hit is significant. Unfortunately, that method implementation throws a NotImplementedException :((((((. But that makes me wonder if in Whidbey it's possible.... :)

In a future post, maybe here, maybe on MSDN online, I will explain how to take advantage of IXmlSerializable interface to implement custom serialization but also to gain automatic XSD validation right from your assembly-embedded schema.

+ For the curious, the complete XmlSerializer-generated file for the schema.
Posted by Daniel Cazzulino | 2 comment(s)
Filed under:

Note: this entry has moved.

Update: check a more thorough explanation of this techique in Code Generation in the .NET Framework Using XML Schema article published in the MSDN XML DevCenter, and the companion post on the VS.NET custom tool for it.

I've always been disgusted by the imposibility to customize XSD.EXE tool. I've even thought about some workaround for the all-public-fields issue. However, I was WRONG. It's perfectly possible to generate fully customized classes from an XSD Schema, if not by calling XSD.EXE, by reusing the very same classes it uses.

This is achievable without any reflection hack! All public (althought certainly undocumented) classes and methods are used.

The "trick" involves using two key classes: XmlSchemaImporter and XmlCodeExporter, both from the System.Xml.Serialization namespace.

// Load the schema to process.
XmlSchema xsd = XmlSchema.Read( stm, null );

// Collection of schemas for the XmlSchemaImporter
XmlSchemas xsds = new XmlSchemas();
xsds.Add( xsd );
XmlSchemaImporter imp = new XmlSchemaImporter( xsds );

// System.CodeDom namespace for the XmlCodeExporter to put classes in
CodeNamespace ns = new CodeNamespace( "Generated" );
XmlCodeExporter exp = new XmlCodeExporter( ns );

// Iterate schema items (top-level elements only) and generate code for each
foreach ( XmlSchemaObject item in xsd.Items )
{
  if ( item is XmlSchemaElement )
  {
    // Import the mapping first
    XmlTypeMapping map = imp.ImportTypeMapping(
      new XmlQualifiedName( ( ( XmlSchemaElement ) item ).Name, 
      xsd.TargetNamespace ) );
    // Export the code finally
    exp.ExportTypeMapping( map );
  }
}

// Code generator to build code with.
ICodeGenerator generator = new CSharpCodeProvider().CreateGenerator();

// Generate untouched version
using ( StreamWriter sw = new StreamWriter( @"E:\Generated.Full.cs", false ) )
{
  generator.GenerateCodeFromNamespace(
    ns, sw, new CodeGeneratorOptions() );
}

The CodeNamespace variable ns contains a full CodeDom hierarchy with all the types that were generated. Therefore, we can easily customize their definitions by adding attributes, custom methods, etc. Even converting those annoying public fields to properties, which is now much more robust than the find-and-replace method I used on a previous life :):

+ FieldsToProperties method

Now, simply passing the namespace generated by the previous code will result in custom classes with properties instead of fields, with the appropriate XmlSerialization attributes as generated initially. Below is the customized complete schema for the Pubs database:

+ Pubs XSD schema customized.

+ Complete Pubs XSD
Posted by Daniel Cazzulino | 8 comment(s)
Filed under:

Note: this entry has moved.

Have you ever wondered how does the XmlSerializer really works? Well, it creates a temporary assembly that is built by reflecting the type you pass to the constructor. Wait! Don't panic because of the "reflecting" word!
It does so only once per type, and it builds an extremely efficient pair of Reader/Writer classes that will handle serialization/ deserialization during the life of the AppDomain.

These classes inherit the public XmlSerializationReader and XmlSerializationWriter classes in the System.Xml.Serialization namespace. If you want to take a look at the generated code, add the following setting to the application configuration file (web.config for a web application):

<system.diagnostics>
  <switches>
    <add name="XmlSerialization.Compilation" value="4"/>
  </switches>
</system.diagnostics>

Now the serializer won't delete the temporary files generated in the process. For a web application, the files will be located in C:\Documents and Settings\[YourMachineName]\ASPNET\Local Settings\Temp, otherwise, they will be located in your current user Local Settings\Temp folder.
You will see code that is exactly what you would have to do if you wanted to efficiently load Xml in .NET: use nested while/if as you read, use XmlReader methods to move down the stream, etc. All the ugly code is there to make it really fast.

Special thanks to Chris Sells for his XmlSerializerCompiler utility.

Posted by Daniel Cazzulino | 11 comment(s)
Filed under:

Note: this entry has moved.

Lately I've been digging inside the XmlTextWriter class. I'm working on an alternate implementation to the traditional state machine based on arrays, one based on a mix of hierarchical state machines and DOM-like events propagation, for an XmlWriter-inherited class.
During this investigation, I found several places where string manipulation is not optimal in aforementioned class. Specifically, even if it uses the StringBuilder class, it mixes calls to it with String.Concat, which is completely useless. Look at the following example taken from the StartDocument method (called by WriteStartDocument):

builder1.Append(string.Concat(" encoding=", this.quoteChar));

This is functionally equivalent to:

builder1.Append(" encoding=").Append(this.quoteChar);

So, why are the strings concatenated? Even temporary arrays of strings are built only to be concatenated and passed to the Append method later. Do these guys now something about string handling we don't or is this just a bit more inefficient code?

References:

Posted by Daniel Cazzulino | 2 comment(s)
Filed under:

Note: this entry has moved.

Mmm... the following code causes a System.NullReferenceException:
using (FileStream fs = new FileStream(@"e:\xmltextwriter.xml", FileMode.Create))
{
  XmlTextWriter tw = new XmlTextWriter(fs, System.Text.Encoding.UTF8);
  tw.WriteDocType("html", 
      "-//W3C//DTD XHTML 1.0 Transitional//EN", 
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd", 
      null);
  tw.WriteStartDocument();
  tw.Flush();
}
The exception is thrown at the WriteStartDocument() call. Of course the document being written is not valid according to the Prolog and DTD Declaration specification in XML, but we should get a meaningful exception, right?
Posted by Daniel Cazzulino | 1 comment(s)
Filed under:

Note: this entry has moved.

I came across the need to ensure that upon machine startup, certain programs in the Start group get executed even if I'm not around to logon with my session. This is typically the case if you have your webserver in your machine and want to automatically connect to the Internet and update a dynamic DNS service with the new IP. You could use SrvAny to make those apps look like services that would run without the need to logon. But most applications store settings in user-specific folders, and will not work without an initiated session.
Microsoft offers a "solution" to you: enable automatic logon. Ups. It works in Windows 2003 too. But now the problem is that the machine will remain logged-on until your password-protected screen saver comes up to lock the workstation. I'm using the following script, which gives the startup programs enough time to do their work and then automatically blocks the session:

<package>
   <job id="lock">
      <script language="JScript">
         var WshShell = WScript.CreateObject("WScript.Shell");
         // 5 minutes are enough for me ;)
         WScript.Sleep(300000);
   WshShell.Exec("rundll32 user32.dll,LockWorkStation");
      </script>
   </job>
</package>

This script is a .wsf file executed by Windows Script Host. We want to ensure this script is ALWAYS run. You know that pressing Shift at windows start skips executing the Start group. To avoid that we can make this script be run though the registry:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run]
"LockStation"="C:\\LockStation.wsf"
Posted by Daniel Cazzulino | 2 comment(s)
Filed under:

Note: this entry has moved.

It's common coding convention to add a space surrounding method parameters. However, it can be quite annoying to keep remembering it. Examples of such convention are:

Console.WriteLine( "\nFinished" );
int var = Int32.Parse( Console.Read() );
DoSomething( Int32.Parse( Console.Read() ), 
	DoSomethingElse( var ) );

As you can see, it gets more and more annoying when you have nested method calls, as closing each requires placing a space between parenthesis too. You can easily add such formatting to a whole solution by executing this simple macro:

Sub FormatParenthesis()
    Dim result As vsFindResult

    DTE.ExecuteCommand("Edit.Replace")
    DTE.Windows.Item(Constants.vsWindowKindFindReplace).Activate()
    DTE.Find.FindWhat = "{[^ (]}\)"
    DTE.Find.ReplaceWith = "\1 )"
    DTE.Find.Target = vsFindTarget.vsFindTargetSolution
    DTE.Find.MatchCase = False
    DTE.Find.MatchWholeWord = False
    DTE.Find.MatchInHiddenText = True
    DTE.Find.PatternSyntax = vsFindPatternSyntax.vsFindPatternSyntaxRegExpr
    DTE.Find.ResultsLocation = vsFindResultsLocation.vsFindResultsNone
    DTE.Find.Action = vsFindAction.vsFindActionReplaceAll

    result = DTE.Find.Execute()
    Do While result = vsFindResult.vsFindResultReplaced
        result = DTE.Find.Execute()
    Loop

    DTE.Find.FindWhat = "\({[^ )]}"
    DTE.Find.ReplaceWith = "( \1"
    result = DTE.Find.Execute()

    Do While result <> vsFindResult.vsFindResultNotFound
        result = DTE.Find.Execute()
    Loop
    DTE.Windows.Item(Constants.vsWindowKindFindReplace).Close()

End Sub
Posted by Daniel Cazzulino | 4 comment(s)
Filed under:
More Posts Next page »