Dumping Objects Using Expression Trees

Tuesday, August 3, 2010

.NET .NET3.5 .NET4.0 C# C#3.0 C#4.0 CodeProject ExpressionTrees LINQ

No. I’m not proposing to get rid of objects.

A colleague of mine was asked if I knew a way to dump a list of objects of unknown type into a DataTable with better performance than the way he was using.

The objects being dumped usually have over a dozen of properties, but, for the sake of this post, let’s assume they look like this:

class SomeClass
{
    public int Property1 { get; set; }
    public long Property2 { get; set; }
    public string Property3 { get; set; }
    public object Property4 { get; set; }
    public DateTimeOffset Property5 { get; set; }
}

The code he was using was something like this:

var properties = objectType.GetProperties();
foreach (object obj in objects)
{
foreach (var property in properties)
{
property.GetValue(obj, null);
}
}

For a list of one million objects, this is takes a little over 6000 milliseconds on my machine.

I immediately thought: Expression Trees!

If the type of the objects was know at compile time, it would be something like this:

Expression<Func<SomeClass, int>> expression = o => o.Property1;
var compiled = expression.Compile();
var propertyValue = compiled.Invoke(obj);

But, at compile time, the type of the object and, consequently, the type of its properties, is unknown. So, we’ll need, for each property, an expression tree like this:

Expression<Func<object, object>> expression = o => ((SomeClass)o).Property1;

The previous expression gets the value of a property of the conversion of the parameter of type object to the type of the object. The result must also be converted to type object because the type of the result must match the type of the return value of the expression.

For the same type of objects, the collection of property accessors would be built this way:

var compiledExpressions = (from property in properties
                           let objectParameter = Expression.Parameter(typeof(object), "o")
                           select
                             Expression.Lambda<Func<object, object>>(
                                 Expression.Convert(
                                     Expression.Property(
                                         Expression.Convert(
                                             objectParameter,
                                             objectType
                                         ),
                                         property
                                     ),
                                     typeof(object)
                                 ),
                                 objectParameter
                             ).Compile()).ToArray();

Looks bit overcomplicated, but reading all properties of all objects for the same object set with this code:

foreach (object obj in objects)
{
    foreach (var compiledExpression in compiledExpressions)
    {
        compiledExpression (obj);
    }
}

takes a little over 150 milliseconds on my machine.

That’s right. 2.5% of the previous time.

8 Comments

Hi,

I am not sure how fast the DataSet is at loading from XML; but it could also be fun to try the XmlSerializer to first convert the object to XML then just load the XML into a DataSet.

Why? Because the XmlSerializer uses reflection.emit to generate a custom serializer the first time it comes across a new type, and then caches the generated serializer (which is of course JIT compiled).

So the first object takes a hit (when the serializer is generated)...but the other million can be serialized like lightning. Of course, I suspect the slow part would be loading the serialized XML into the dataset - but it could be worth a try even just for fun. It would only be a couple of lines of code:
a) Create the serializer and serialize the object
b) Load it into a dataset

Dav

David Taylor - Tuesday, August 3, 2010 4:27:24 AM

I suspect that:

column[c] = expressions[c](obj);

will be a lot faster than writting and then reading XML.

Paulo Morgado - Tuesday, August 3, 2010 10:56:33 AM

Hey Paulo,

Yeah I am sure you are right. I actually started looking at the XmlDataDocument (which wraps the dataset) to see if you could directly serialize into the dataset - and noticed that Microsoft has now marked that class as deprecated (to be removed in a future version of the framework).

The interesting thing for you would be the percentage of time spent actually inserting the data into the DataSet/DataTable versus the time spent reading the property values. In other words, reducing to 2.5% of the previous time is great - but if most of the time is actually loading the data into the DataSet, it might not make as much of a difference as you hoped.

Anyway - a great post. Way back in 2002, I wrote a reasonably cool class (similar to the XmlSerializer) which would take an object hierarchy with associated metadata mapping the object graph to flat name/value pairs (for HTTP POST). Like the XmlSerializer, it would use reflection.emit the first time it came across a new type and cache the generated serializer. It was very cool at the time and amazingly fast ;-)

If course lots has changed since then. .NET 2 introduced Lightweight CodeGen (along with the ability to actually garbage collect the generated assembly)....and now we have Expressionn Trees.

I do love the beauty of using expression trees as shown in your demo, and how readable the code is. I love the fact that you are writing code in C# instead of needing to spit out IL via Reflection.Emit. If I looked at my 8 year old reflection.emit code I would need to slam my head against a wall a few times before I could understand the code I myself wrote ;-)

Regards,

David Taylor

David Taylor - Wednesday, August 4, 2010 5:23:13 AM

You're right when you say that raw performance might be meaningless in a real life situation. But then again, if you have a set machine doing nothing but this 24/7, the gain might be not seconds or hours, but machines. As always, it depends.

I played around with Reflection.Emit but never used it on a real life situation. When you emit IL you become a nowadays compiler writer.

I think some languages on the .NET stack already emit expression trees as its output and in the future all compilers will emit expressions. Like IL to native compilation can be optimized and ported across platforms regardless of what language and/or code was on its origin, the same happens to expression tree to IL compilation. Nothing says that an expression tree has to be converted in IL - it doesn't on LINQ to SQL/EF/SharePoint/AD.

I wonder if in the future the runtime will run expression trees instead of IL. :)

But going even further, I think Mono already has a working compiler-as-a-service. You don't even have to write expression trees. :)

Paulo Morgado - Wednesday, August 4, 2010 8:27:20 AM

Yeah I have played with Mono's compiler as a service. Very cool.

I would love a scripting environment to be as nice as visual studio (with intellisense) that automatically adds a bunch of using statements (etc) letting you just write lines of script code in C# with full intellisense. What Mono has done is cool, but I would love intellisense for LINQ expressions, etc.

It is a nice way of letting us developers write simple scripts without needing to become sys admins and learn powershell, etc.

The Mono team obviously had the advantage that their compiler was *always* written in C# (ever since they started the project in 2001/2002), while Microsoft's C# compiler was written in C (C++?) and thus they really need to rewrite the compiler in C# before they can do an efficient 'compiler as a service' in .NET 5 (or is that .NET 6)?

Microsoft faced the bootstrapping problem. MONO already had a C# compiler available they could use to compile their C# compiler ;-)

Dave

David Taylor - Wednesday, August 4, 2010 12:21:13 PM

I'm starting to use LINQPad (http://linqpad.net/) more as a testing and scripting environment because of that. The paid intellisense is even better than Visual Studio's in some use cases.

PowerShell is aimed at the IT Pro. It's supposed to be "command liny". If it was developed now, I'm afraid they would probably have messed it up with Iron-Stuff.

My biggest problem with PowerShell is that I tend to do .NET scripting instead of PowerShell scripting. :D

I'd love to see a Visual Studio console application project with PowerShell-like command line argument handling.

As far as I know, "compiler as a service" is for "a" (not necessarilly "the") next version of .NET.:D

Paulo Morgado - Wednesday, August 4, 2010 12:46:46 PM

Hydrating Objects With Expression Trees - Part I
After my post about dumping objects using expression trees, I’ve been asked if the same could be done for hydrating objects.

Paulo Morgado - Monday, August 16, 2010 1:19:52 AM

Hydrating Objects With Expression Trees - Part III
To finalize this series on object hydration, I’ll show some performance comparisons between the different methods of hydrating objects.
Code samples for this series of posts (and the one about object dumping with expression trees) can be found on my MSDN Code Gallery: Dump And Hydrate Objects With Expression Trees

Paulo Morgado - Wednesday, August 18, 2010 9:27:06 AM

Comments have been disabled for this content.