“Try to avoid foreach/for loops”–Over my Dead Body!

Monday, January 24, 2011

.NET C# IL

Before I get into this a little bit, know that my comments are a direct response to this post: http://www.codekicks.com/2011/01/try-to-avoid-foreachfor-loops.html. The reason I’m writing this post is because the author is making invalid comparisons through his use of each of the looping mechanisms in C# to QUICKLY come to the conclusion that do-while and while are faster than for/foreach. I’ve done my own tests modeled after the author’s tests, but with significantly different results. As a computer scientist, I want to make sure I actually analyze the data before coming to a conclusion about the runtime behavior.

What He Got Wrong

The first test he’s running is a foreach loop. This all fine and good… this is something we probably do on a daily basis.

What is the foreach loop actually doing? The compiler translates this into a local variable for the enumerator and a boolean. In IL, one of my runs produced a binary with the IEnumerator being named CS$5$0000. Then the compiler translates the body and the behavior of the foreach into a bunch of labels in IL. First MoveNext is called and the body of the forloop (now a label) is jumped to and executed. Then the MoveNext is called again and the loop starts over.

The next thing the author moves to is the for loop. Here’s his code:

What this is actually doing is COMPLETELY different from the semantics of the forloop. What he’s doing is just running a for loop on a separate integer. He’s not running through the iterator. This is the thing that invalidates his tests. He has WAY too many experimental variables, leading to invalid comparisons. The same fallacy occurs in his while test and the do-while tests:

He’s not actually iterating through the list of numbers. This is a huge issue. You can’t possibly make valid comparisons when you completely change your test scheme. This is just bad test design.

The Right Idea

The idea of testing the different looping mechanisms isn’t a bad one. However, his test design doesn’t really take a few things into account that is really significant to the outcome of the time.

First of all, GC is a huge issue. He’s running all 4 tests in succession on the same thread. The GC in .NET, unlike the looping, can kick in whenever it damn well pleases to cleanup garbage. When you’re dealing with 10 million items, there’s bound to be at least 1 GC run in there somewhere, which can significantly mess with the times of the stopwatch. The stopwatch is a really dumb object… it just keeps track of CPU clicks. It has NO idea what the GC is up to.

A better design would be to simply run 1 trial at a time. That way, the order of the tests doesn’t influence the GC.

Wrong Data Structure

The author probably doesn’t know how lists are implemented (as I suspect most .NET devs don’t). If you’ve ever taken a class in C++ or C, you know that you have to manually resize arrays yourself if you’re going to have expandable array (“list”) behavior. The author chose to use a list to add 10 million items in it. The list, when setup, doesn’t know how many items you’ll be adding to the data structure. It choses an arbitrary size, allocates an array of that size, and expands if you add more items than the size it picked. But when it expands, it has to do a complete copy of the array EVERY SINGLE TIME an item is added (see the iternals of Insert on list). Since you already know the size of the data, just use an array to begin with!

Let’s Test This For Real

I’m done ranting about a poorly design perf test. It’s time to run my own. The proof is in the pudding.

First I’m going to run identical tests in succession and let GC do whatever it wants. As you can see from my code, I’m using iterators EVERY TIME for consistency.

Surprising results: they’re all about the same.

All except for that do-while… anyone wanna tell me why? My guess is that GC took over at some point inside the stopwatch execution. Let’s run it again.

Hmmmmmmmmmm interesting! While and do-while are actually SIGNIFICANTLY SLOWER than for and foreach…. what an interesting turn of events. Let’s run them individually now.

All are around the same time, interestingly. Honestly, I think this is a testament to the great work the C# compiler guys do. Here’s the deal. At the end of the day, the body of any looping construct is a label generated by the compiler and then interpreted by the CLR. These guys shouldn’t have completely different running times.

Conclusion

This post started off as a complete rant about an incorrect test and ended with a proof that there’s a lot more going on in the test design than the original author was fooled into thinking. Let me just say that this post isn’t personal. I don’t even know the original author. I just happened on his post and about flew out of my chair. Just think about how this sounds: different looping constructs have different runtime behavior. This seems completely rediculous to me. If this were true, the C# compiler writers would certainly have converted all loops to the one that performed better… it wouldn’t be hard to do at all. However, this isn’t the case since all loops are turned into labels in IL.

So the moral of the story is that you can keep writing your for loops and foreach loops, there’s no performance gain either way.

UPDATE: The author changed the title from "Try to Avoid Foreach/For loops" to "An observation on .NET loops". I copied and pasted the original title for my blog post... so that's where the title comes from :)

Thanks for the post. I knew the other post wasn't right but didn't quite put my finger on why it wasn't right :)

Chris - Monday, January 24, 2011 8:41:41 PM

Hi,

Thanks for this. I have noticed the quality of posts on weblogs.asp.net has gone way down over the last 2-3 years after Microsoft actively recruited more beginner bloggers to the service and many of the posts are just not of the standard of those who were blogging between say 2002 and 2005.

I have been really frustrated by a number of posts (from two authors) over the last 4-6 weeks confusing passing items by reference and value and string immutability. The problem is their posts are written with 'authority' and must end up doing huge damage to the user community who are attempting to learn from such posts.

I tried to correct their posts (politely) in the comment section - but they still did not get it and claimed to have a complete understanding.

I will not link to those posts...Trust me it will have you even more frustrated, and you have probably wasted enough time. We both need to punch the wall a few times and then just get back to work :-(

David Taylor

David Taylor - Tuesday, January 25, 2011 12:21:33 AM

@David: Sigh... you are so right. Last week someone proposed to change the culture of the complete thread to perform a date conversion with a different culture. Comments explicitly describe the error in that - with an msdn reference - but still the author insists: "The msdn article you refer to talks about cultures, and I'm using cultures, so it's the same." He even got further and created a downloadable DLL for all of us to use so we could easely change the thread culture for a simple date conversion. It's insane!

Wesley - Tuesday, January 25, 2011 9:14:23 AM

At least that article isn't on here.

At some point, that empty for loop could be completely optimized away and take Zero ticks.

AndrewSeven - Wednesday, January 26, 2011 5:41:12 PM

@ Jesse

My personal approach is to assume that the language designers knew what they were doing, so everything is fair game. If I see lots of arguments against (or for) a specific use, then I investigate further, especially in official sources.

Ron - Wednesday, January 26, 2011 8:50:13 PM

Simple and to the point :)

pathro - Saturday, January 29, 2011 4:35:50 PM

What He Got Wrong

The Right Idea

Wrong Data Structure

Let’s Test This For Real

Conclusion

6 Comments