August 2003 - Posts

AltSerialization: binary serialization, the smarter way

Note: this entry has moved.

You already know you can count on the BinaryFormatter to serialize/deserialize your objects in binary format. This beast will take care of properly serializing an entire object graph for you without any effort on your part. But if you’re dealing with value types or simple reference types you can do better if you code your own class with specific knowledge in serializing such types, thus performing faster and getting smaller binary representations than using BinaryFormatter. Let’s anticipate some results:

Type

Size in bytes using BinaryFormatter

Size in bytes using custom class

DateTime

59

9

TimeSpan

60

9

Guid

110

17

As you can see there is lot of room for savings. Apparently the ASP.NET team thought the same thing and came up with a little class well hidden in the System.Web.Util namespace: the internal AltSerialization class. Ouch!, internal? Yes, *internal*.

What does AltSerialization do?

This really simple class will write and read values in binary format, with special knowledge on how to deal with the following 14 types: Boolean, Byte, Char, DateTime, Decimal, Double, Int16, Int32, Int64, Single, String, UInt16, UInt32, UInt64.

How does it do this?

When writing a value it will first examine its type and based on it performs an optimized serialization. Let’s take for instance a DateTime: at the heart of this value type is the ticks member variable representing the number of ticks that have elapsed since 12:00AM, Jan. 1, 0001. Writing only the ticks value should be well enough for being able to reconstruct the DateTime instance later on; and… ticks is an Int64 (8 bytes in length), much less than the 59 bytes spitted by the BinaryFormatter!. Of course we still need some way to identify that what we’re storing is a DateTime and not another type so we can properly deserialize it later. This means that each type that AltSerialization knows how to properly serialize will be prefixed by a type code (taken from the nested AltSerialization.TypeID enumeration). This is a byte enumeration so it will only add one byte to every persisted type (the persisted DateTime ends up taking up 9 bytes, exactly 50 bytes less than the BinaryFormatter).

In case you’re wondering why the System.TypeCode enumeration wasn’t used –after all AltSerialization.TypeID is almost duplicating it- I believe the answer is space. While the first one uses an Int32 as its base type the second one uses a byte.

Lastly, if AltSerialization doesn’t know how to serialize a given type it will just pass it along to a BinaryFormatter.

Getting clever from v1.0 to v1.1

It’s nice to see how AltSerialization got a bit clever in version 1.1 of ASP.NET, adding support for serialization of five additional types: Guid, IntPtr, SByte, TimeSpan, UIntPtr.

Suspicious array usage

While I was looking inside AltSerialization I found something interesting in its private static constructor. In there, a static Type array is initialized to hold the types the class knows how to custom serialize. These are the types listed in the AltSerialization.TypeID enum. The funny part is that the array is initialized with two extra slots and the first item is stored at index 1 (slot zero is not used at all). While I’ve no idea why slot zero was not used (I can only smell vb.net for this) I think I may have a clue about why there are two extra slots. My guess is that the developer first counted the types listed in the enum -where null and object are included- and then made an array large enough to hold all items; later on, he realized that there is no sense in comparing if a type is of type object and there isn’t much sense in storing null as one of the supported types. My theory says that after realizing this, the developer just forgot to reduce the array’s size. What’s yours?

Ringing bells

I’ve checked a couple of released projects and they don’t seem to be using any AltSerialization-like approach. I don’t know why. Maybe I should start pinging them about this.

Posted by vga with 12 comment(s)
Filed under:

Learning to play "catch-up"

Note: this entry has moved.

The control execution lifecycle has proven to be one of the most difficult things to grasp by letting beginners totally disconcerted about how the webform model works. About every control and page developer that is messing with dynamically created controls has already played –knowingly or unconsciously- the “catch up” game. Playing it knowing its rules will always make you a winner, on the contrary, playing it without that knowledge will cause you anything but trouble.

 

Here is some code that dynamically creates a textbox control and adds it to the form control:

 

TextBox tb = new TextBox ();

tb.Text = “blah”;
tb.ID = “mytb”;
Control frm = FindControl (“FormID”);

frm.Controls.Add (tb);

 

As you can see there is nothing extraordinary about it, on the contrary it’s pretty darn simple; the difficult part is answering the question: “where can I put that code?

 

Reading the docs is just not enough

 

By looking at the docs we can see that the Load phase is happening past the Load viewstate which would imply that viewstate data is always loaded before the Load event. If this is true, adding our previous snippet of code at Page_Load should cause our textbox to not load its viewstate properly, right? Wrong. The textbox, when added to a ControlCollection collection (in our example it’s the one for our form control), will immediately play “catch-up” and perform every action that it has missed due to its late insertion in the control tree (Init and LoadViewState in our example) thus properly restoring its viewstate data.

 

If you’re into control development chances are that you already own Nikhil’s book. It’s the only source I’ve found that briefly touches this subject with the following paragraph on page 179:

 

“…what if a control is created in an event handler and dynamically added to the control tree? In that case, the control plays catch-up. As soon as it is added to the control tree, it starts to execute its phases until it reaches the current phase of the page…”

 

It is also the source from where I had stolen the “catch-up” term for my post’s title. J

 

Now, where is this “catch-up” game being played…?

 

Entering the “Catch-up” Stadium

 

The catch up happens at the internal protected Control.AddedControl method which –after removing the control from any previous parent and taking care of its ID generation- will examine the current status of the control owning the ControlCollection and based on it determine all the steps the added control has already missed forcing it to “play” them at once. So following our previous example, it means that when we add tb to the ControlCollection of the form control (through the Controls property), Control.AddedControl will immediately call tb’s InitRecursive and LoadViewStateRecursive methods. But who is calling AddedControl in the first place? It’s the ControlsCollection.Add method which makes that call as its last step.

 

What exactly can be “catched up”?

 

The internal ControlState enumeration defines the five possible statuses a control may have during its lifetime:

 

Status

Value

Constructed

0

Initialized

1

ViewStateLoaded

2

Loaded

3

PreRendered

4

 

So, depending on the current status of the parent control, Control.AddedControl will call the following internal methods on the control being added: Control.InitRecursive (if parent is already initialized), Control.LoadViewStateRecursive (if parent’s viewstate is already loaded), Control.LoadRecursive (if parent loading phase has been already executed) and Control.PreRenderRecursiveInternal (if parent pre-rendering phase has been already executed)

 

When is this game played?

 

This “catch up” game is not always played. If the parent’s status is Constructed meaning it has not been even initialized yet the Control.AddedControl will just return without playing any games because there is nothing to catch up for the child control.

 

A curiosity

 

If you were paying real attention to the method names listed in the above paragraph you may have noticed that they all follow the naming pattern “phase in the control execution lifecycle” + the word “Recursive” to clearly denote they’re recursive methods. The only exception is PreRenderRecursiveInternal that happens to include its access modifier of internal as part of its name. But all of the above mentioned methods are internal so why only this one got such a postfix?

Posted by vga with 44 comment(s)
Filed under:

I'm back

Note: this entry has moved.

My last post was almost two months ago… this is not what I intended when I started blogging but I’ve been doing lots of things that are taking most of my time:

- Spending lots of time testing whidbey alpha bits and reporting bugs (I believe I’m not allowed to say a word about this)
- Writing a super-ultra-cool utility that should help a lot in learning fx internals (basically scanning every piece of metadata, generating my custom db and presenting lots of useful views of it, all this including comparisons of fx v1.0, v1.1 and v1.2).
- Trying to make Lutz’s excellent Reflector support generics
- Wrote three columns about… guess what… yes ASP.NET
- Gave two talks on… guess what… yes ASP.NET
- Done a couple of reviews
- Hanged around the public MS newsgroups (where I just passed the 4.200 posts mark)

And of course I’ve my family and a girlfriend… so, definitely, time *is* money.

I’m currently planning a couple of topics for my next posts but if you would like me to touch any particular topics just let me know.

Posted by vga with 5 comment(s)
Filed under:
More Posts