DataSet Serialization

Published 28 April 04 09:53 AM | despos

A large part of the .NET literature out there encourages you to use DataSet object to streamline data across the tiers of a distributed application. The DataSet is serializable; easy to use; provided with an effective object model; can be saved to/restored from XML; can contain a schema; is stateful (i.e., preserves a history of changes). And so forth.

There's nothing bad with this. But when you have distributed apps spanning over one or more machines (AppDomains would suffice) .NET Remoting comes up like it or not; implicitly or explicitly.

There's nothing bad with .NET Remoting too. However, when a DataSet object is remoted (or simply serialized to any stream) the .NET runtime serialization infrastructure gets into the game.

The DataSet object implements ISerializable meaning that it provides for its own serialization format and data layout. And it does that using plain XML. (Not even blanks and tabs are removed.) As a result, when you remote a DataSet you are actually passing data using a large chunk of data made after a verbose schema--the DiffGram. No matter you use a binary channel and formatter.

This point is well addressed in the following KB article. Which, in turn, references an MSDN Magazine article of mine for further details. Both articles illustrate some workaround, the best of which I believe is using a compact-formatter that just zips everything being transferred.

The good news is that this seems to be fixed in 2.0. A new property--RemotingFormat--on DataSet and DataTable objects makes possible for you to interact with the implementation of ISerializable and choose a binary format to optimize performance in .NET Remoting scenarios.


Note that the same bad thing occurs if you store DataSet objects to ASP.NET Session with the session state configured to work out-of-process.

Comments

# Milan Negovan said on April 28, 2004 02:49 PM:

Dino, I don't know if this is a good thing or not, but DataSet derives from the MarshalByValueComponent class.

I create lightweight datasets by deriving from DataSet and adding columns on the fly. If I pass a dataset between tiers and if one of them modifies anything in the dataset, e.g. sets errors on offending columns, these changes are not reflected when the dataset comes back. I find it pretty frustrating. I don't want to create a wrapper and have it derive from MarshalByRefObject just for that. What's a good way out?

# TrackBack said on April 28, 2004 06:27 PM:

Take Outs for 28 April 2004

# Sahil Malik said on May 3, 2004 03:32 PM:

Dino,

I have dealt with this stupid XML dataset serialization more than I've wanted to.

I am leading an application I inherited that uses TONNES of serialization of datasets in a remoting environment. I ended up using DataSetSurrogate mentioned above for a major part, for the performance benefit - until I realized where in a lot of front end UI the developer didn't call "AcceptChanges", and that ended up voiding out my changes, or for that matter DataSetSurrogate getting confused between a default value of 0D versus "0", comparing decimals with strings etc ... it was too much pain to deal with.

Eventually, I ended up converting the problematic portion of the code to not use datasets as their default serialization, but custom objects - i.e. "Smart Data" (I am stealing this term from Rockford Lhotka).

That approach had one added benefit, that I could now put my validation rules in the smart data, and not worry about putting it in the UI or Data Layer, or the Business layer ..

Just my 2 cents :)

- Sahil Malik

# TrackBack said on May 3, 2004 06:40 PM:

Take Outs for 3 May 2004

# Fyodor Sheremetyev said on June 22, 2004 10:10 AM:

There are two bugs in DataSetSurrogate from KB829740.
1) DefaultValue in ConvertToDataColumn() is assigned before DataType. Thats why DefaultValue is always converted to System.String.
2) DefaultValue in IsSchemaIdentical(DataColumn dc) is compared by operator ==. Should use Object.Equal instead.

# Ionut Nechita said on August 2, 2004 08:31 AM:

They've updated the code to include the fixes for the bugs shown by Fyodor, in the KB829740.

# TrackBack said on October 30, 2004 08:32 AM:

Leave a Comment

(required) 
(required) 
(optional)
(required)