Syndication

News

     

Archives

Miscelaneous

Programming

February 2004 - Posts

Note: this entry has moved.

Yesterday I received the following important notice:
OASIS Emergency Management Technical Committee have approved a Committee Draft specification for the "Common Alerting Protocol Version 1.0"
I guess sooner or later we'll have another long awaited spec:
OASIS Bathroom Contention Technical Committee have approved a Committee Draft specification for the "Common Avid Bathreader Syncronization Protocol Version 1.0"
:S
Posted by Daniel Cazzulino
Filed under:

Note: this entry has moved.

Most of the APIs in .NET have a layered design, most notably the IO classes. System.Xml namespace builds on this layered design, but falls short, IMO. Basically, you have the following layers in a typical XML parsing activity:

  1. Input: a System.IO.Stream implementation, such as a FileStream, BufferedStream, NetworkStream and so on, or directly a string passed to the next layer.
  2. Basic reader: most probably a System.IO.StreamReader, or a StringReader if the previous layer is skipped.
  3. XmlReader: the actual parser implementation. In .NET, the XML parser is the XmlTextReader class.

Maybe it's just me, but isn't there a layer missing there? The "Lexical Analyzer" or "Scanner"? Well, it turns out that's it's missing to the public, but the XmlTextReader of course uses one, its XmlScanner. Wouldn't it be cool if this layer was exposed explicitly, so that you could tell the parser which scanner to use? Imagine that an imaginary scanner could present as XML tokens some binary stuff comming from the basic reader layer... I know all the discussions about binary XML, I'm just thinking about the clever solution for SVG, SVGZ or "zipped SVG". I don't have to tell you how well the zip algorithm is in general, but with highly redundant data such as XML (i.e. all the repeated tag names) the size reduction is really awesome.  

Back to the topic, however, the XmlTextReader violates this separation with its internal XmlScanner class. Namely, the scanner BUFFERS its reads, instead of delegating this responsibility to the appropriate layer, which already implements such funcionality in the BufferedStream class. One consequence of this violation is that the stream position is no longer relevant as you will never know how far the internal scanner has gone. Have you ever dreamed of a "ResetableReader"? You can kiss that dream goodbye for now.

If the scanner didn't violate the separation, we could implement such a reader as follows:

  • Read until some arbitrary point.
  • Store current stream position.
  • Create a new reader to read starting from current position (one that stops reading when it finds elements "outside" its scope), and use it internally instead of advancing the "real" one.
  • Upon a call to a Reset() method, discard the "inner" reader and reposition the stream.

So, we could confidently hand such a reader to some arbitrary component to do whatever it has to do with the data, without risking our own positioning in the reader. This is typical in XML processing pipelines. You don't want the previous pipeline to mess with the "real" reader and break processing in later ones. Similarly, if you configure components to handle processing of certain elements (for example, with the handler registration mechanism allowed by Xml Streaming Events), you don't want one handler to screw the reader and forbid other handlers from doing their work. You could have the following syntatic sugar also:

ReseteableReader rr; //Initialize somehow // Do some reading // We're about to hand the reader to some other component using (rr.CreateResetPoint()) { Process(rr); } // Now we're exactly where we left before entering the "using"

But as the scanner is buffering (something that should be left to the lower layer, as stated), the only way to get "what's left" in the stream without losing what has already been buffered is to use the XmlTextReader.GetRemainder() method. Guess what, after calling that method, you have effectively screwed your "main" reader. And as the XmlTextReader doesn't support ICloneable either, you can't even store/clone/keep its internal state before screwing it. I heard someone suggesting that one *hack* would be to store the element qname and depth, construct a new reader and read again until its met again. This is clearly an unnacceptable hack: we would be parsing multiple times the same thing, wasting processing time by reading useless nodes, etc.

What's the moral of the story: cleaner separation allows for novel uses not foreseen originally. Violations lend in the best case to ugly hacks, in the worst case (as in the XmlTextReader) to plain impossibility. Let's keep dreaming about the ResetableReader (or thinking about alternative XML parsers for .NET...).

Posted by Daniel Cazzulino | 7 comment(s)
Filed under: ,

Note: this entry has moved.

Longhorn doesn't work in current VMWare Workstation product (4.0). However, you can use the latest beta (4.5 RC1) which adds (experimental) support for Longhorn under a Windows 2003 Server VM. I've also downloaded it to fix a corruption I got on my VM disks after running Partition Magic, and it worked flawlessly.
Download it from this location using the following registration information:
UserName: workstation
Password: experimental
(information taken from here - not that I know japanese, but I figured it out ;))

Your current VMWare licence key will work. Enjoy!

Posted by Daniel Cazzulino | 9 comment(s)
Filed under:

Note: this entry has moved.

All my programming life was tied to VB. I started with VB3, and finally became a master in VB6, where I was able to do ANYTHING the language would let me. Of course, there were MANY things that I couldn't do, and OOP and design patters were soooo cool that I really needed to get my hands dirty by doing real programming based on them, not just "bathreading". So I made inroads in Delphi and Java for some time.
Then came .NET, and MS gave me a new toy to spend my days (and many nights too) with. VB.NET and C# both provide extensive support for OO programming. Even when I still code and write books (see Amazon) in both languages, I prefer C#, because I find it cleaner and less convoluted. I believe over time, the mix of old VB keywords/syntax and new .NET constructs such as generics, is turning VB.NET into one of the ugliest languages EVER.

For example, I see (from the excelent article on MSDN) the VB format to construct generic types. It simply sucks:

Dim stack As Stack(Of Integer)

I assume if the constructor has parameters, those will go after the type specifier?! Compare that with the elegancy of C# 2.0

Stack&lt;int&gt; stack = new Stack&lt;int&gt;(); Constructor parameters go where you expect them to go, the type specifier is separated from the constructor call. It is simply perfect. For the VB version, I'd like it to be: Dim stack As Stack&lt;int&gt;<br>stack = new Stack&lt;int&gt;();<br><br>'Or<br>Dim stack as New Stack&lt;int&gt;()<br>'maybe <br>Dim stack as New Stack[int]()<br> Maybe the VB.NET team should find an Anders Hejlsberg for their design process...

Update: from the discussion with one of the VB language designers, where he praises YAVBK (Yet Another VB Keyword), I can only say "WTF?!". They're adding an IsNot operand?!?!?:!!?!?!?! From the example justifying it:

(instead of this): If Not x Is Nothing Then Console.WriteLine(”Has a value.”) (you will be able to write this): If x IsNot Nothing Then Console.WriteLine(”Has a value.”)
I wonder why on earth do VBers write code like that?! Look at the following equivalent (more readable) code: If x &lt;&gt; Nothing Then Console.WriteLine(”Has a value.”)<br>'Or<br>If x = Nothing Then Console.WriteLine("Doesn't have a value") It's FAR more readable and understandable than using that awful Is/IsNot test. It boils down to whether you want to teach VBers how to write good/maintanable/readable code or just give them new keywords to keep doing otherwise, but with less code.
Posted by Daniel Cazzulino | 48 comment(s)
Filed under:

Note: this entry has moved.

I can't really believe this is true. Was it REALLY a joke in the beginning?
Posted by Daniel Cazzulino | 1 comment(s)
Filed under:

Note: this entry has moved.

In a previous post I showed and discussed the similarities between the W3C XML Schema type system and the CLR one. Dare commented on it by mentioning a number of already known (at least by me) issues with WXS->CLR mappings, specially the fact that the later supports only a subset of the former.
Given the overwhelming response in favor of similarities against differences (1013 to 0 so far), I can only say that Dare is probably ignoring that most developers are .NET DEVELOPERS, NOT XML theorists and WXS fans. Therefore, most of them completely ignore or plainly don't care about the intricacies of WXS he's talking about. My question was about the features developers really use from WXS, and the answers I got speak for themselves.

So, there's no tautological question as he argues. I can rephrase my question as follows: “If you ignore the parts that are irrelevant/impractical (such as no support from XmlSerializer)/overly-complex-to-be-of-any-use/only-for-WXS-fans/Ph.D-only-material, do the CLR and XSD type system fit well together?”. If I ask the people to vote again, I'm willing to bet whatever I have that I will get the same answer.

That's why my weblog is titled "IXml* - Welcome to the real world". Not only because I'm a big fan of Matrix but because I care about what happens in the daily work with XML.
Posted by Daniel Cazzulino | 15 comment(s)
Filed under: ,

Note: this entry has moved.

I've had some discussions with co-workers and colleages about the WXS (W3C XML Schema) type system and its relation with the CLR one. We all agree that many concepts in WXS don't map to anything existing in OO languages, such as derivation by restriction, content-ordering (i.e. sequence vs choice), etc. However, in the light of the tools the .NET Framework makes available to map XML to objects, we usually have to analyze WXS (used to define the structure of that very XML instance to be mapped) and its relation with our classes.

When you use the XmlSerializer to get a CLR object filled with data in the XML, you're actually mapping it to the CLR type system. Moreover, when you use xsd.exe /classes tool, you're effectively translating WXS types to CLR ones. You get classes with System.String type corresponding to xs:string, and the like. Dare explains this in his article in MSDN. The .NET Framework documentation about the XmlSerializer class explicitly states:

To transfer data between objects and XML requires a mapping from the programming language constructs to XML Schema and vice versa. The XmlSerializer, and related tools like Xsd.exe, provide the bridge between these two technologies at both design time and run time.

Even the XmlValidatingReader.ReadTypedValue performs this map transparently for simple types, which is thoroughly documented in the product documentation under the title Data Type Support between XML Schema (XSD) Types and .NET Framework Types. At the PDC, new and even more comprehensive mapping tools/approaches were shown.

But let's go beyond the simpleType (almost) natural mapping between WXS and the CLR. We can have an abstract complexType in WXS named Person, and derive by extension Employee and Customer ones. Our root element, which will be a list of the contacts we know about, can be a choice of any of them, like so:

<?xml version="1.0" encoding="utf-8" ?> <xs:schema id="XSDTypeSystem" targetNamespace="http://aspnet2.com/xsdvsclr" \ elementFormDefault="qualified" xmlns="http://aspnet2.com/xsdvsclr" xmlns:mstns="http://aspnet2.com/xsdvsclr" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- The base abstract type --> <xs:complexType name="Person" abstract="true"> <xs:sequence> <xs:element name="FirstName" type="xs:string" /> <xs:element name="LastName" type="xs:string" /> </xs:sequence> </xs:complexType> <xs:complexType name="Employee"> <xs:complexContent> <xs:extension base="Person"> <xs:sequence> <xs:element name="EmployeeID" type="xs:string" /> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> <xs:complexType name="Customer"> <xs:complexContent> <xs:extension base="Person"> <xs:sequence> <xs:element name="CustomerID" type="xs:string" /> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType> <xs:element name="Contacts"> <xs:complexType> <xs:choice> <xs:element name="Customer" type="Customer" maxOccurs="unbounded" minOccurs="0" /> <xs:element name="Employee" type="Employee" maxOccurs="unbounded" minOccurs="0" /> </xs:choice> </xs:complexType> </xs:element> </xs:schema>

Now, what do you think such types would look in .NET world? Well, I don't think it takes an expert in WXS to realize that these types map nicely with an abstract Person class, and Employee and Customer derived types. We can confirm that by running xsd.exe /classes with this schema, and we will get the following .NET classes:

/// <remarks/> [System.Xml.Serialization.XmlTypeAttribute(Namespace="http://aspnet2.com/xsdvsclr")] [System.Xml.Serialization.XmlRootAttribute(Namespace="http://aspnet2.com/xsdvsclr", IsNullable=false)] public class Contacts { /// <remarks/> [System.Xml.Serialization.XmlElementAttribute("Employee", typeof(Employee))] [System.Xml.Serialization.XmlElementAttribute("Customer", typeof(Customer))] public Person[] Items; } /// <remarks/> [System.Xml.Serialization.XmlTypeAttribute(Namespace="http://aspnet2.com/xsdvsclr")] [System.Xml.Serialization.XmlIncludeAttribute(typeof(Customer))] [System.Xml.Serialization.XmlIncludeAttribute(typeof(Employee))] public abstract class Person { /// <remarks/> public string FirstName; /// <remarks/> public string LastName; } /// <remarks/> [System.Xml.Serialization.XmlTypeAttribute(Namespace="http://aspnet2.com/xsdvsclr")] public class Employee : Person { /// <remarks/> public string EmployeeID; } /// <remarks/> [System.Xml.Serialization.XmlTypeAttribute(Namespace="http://aspnet2.com/xsdvsclr")] public class Customer : Person { /// <remarks/> public string CustomerID; }

Note that the XSD tool was even smart enough to realize that as both expected elements in the WXS choice for the Contact element inherit from the same Person type, they can actually be part of the same array type, which is defined as Person[] Items. That looks like a pretty nice fit.

In this light, I'm conducting a survey about developer's view on the relation of the XSD type system and the .NET one. Ignoring some of the more advanced (I could add cumbersome and confusing) features of WXS, would you say that both type systems fit nicely with each other?

Valid votes (through comments, will be summed up in this post description) are: YES (they fit nicely) and NO (they don't).

The later sort of implies that you think MS is pushing the similarities too far, and that it's not good. I look forward your comments and votes!

Current votation: YES=13, NO=0
Posted by Daniel Cazzulino | 19 comment(s)
Filed under: ,

Note: this entry has moved.

After knowing that a google search for 'miserable failure' returns Bush bio (this was even an article in the BBC news) I tried my (invented) term 'bathreader' (an activity I believe should be legalized), and guess what: I'm the the only one returned :D (together with VGA that commented on my new alias :|).
So my new weblog subtitle is: Daniel Cazzulino (a.k.a. "kzu" and "avid bathreader") 's .NET and XML digress. Self google-bombed :o).
Posted by Daniel Cazzulino | 3 comment(s)
Filed under:

Note: this entry has moved.

Another really unfortunate news coming from MS. Patents for accessing XML documents generated by Word?! Come on!!!!
Posted by Daniel Cazzulino | 7 comment(s)
Filed under:

Note: this entry has moved.

Finally, Hernan de Lahitte started blogging. He will touch architecture and in-depth details of Shadowfax, as he is one of the main architects and Senior Developer. For those that don't know what Shadowfax is, it's (IMO, exclusively) the Indigo for .NET v.1.x (v2 too?). He's a security paranaoid guy, I must say.... i.e. he has the firewall enabled inside the corp. LAN!! (no way to get any music shared from him :o))
Posted by Daniel Cazzulino | 2 comment(s)
Filed under: ,
More Posts Next page »