Performance: Linked Arrays's now and later

I love testing my code against the latest and greatest available in Whidbey. Turns out there are some collections that are insane when it comes to performance. One such collection is the generic Stack and of course the new Queue. Now both of these new collections are fast, and I mean really fast. Bare-bones do nothing code for the most part, just the minimum to make them work. At least that is what I produced in order to compare them. I figured I'd start with a LinkedArray implementation so that I didn't have to do a bunch of array copies as the stack grows.

public class LinkedArray<T> {
    public T[] Data;
    public LinkedArray<T> Next;

    public LinkedArray() : this( 10, null ) { }
    public LinkedArray( int initialSize ) : this(initialSize, null) { }

    public LinkedArray( int initialSize, LinkedArray<T> next ) {
        this.Data = new T[initialSize];
        this.Next = next;
    }
}

Again, the above is completely bare. We are using fields for extra speed, even though the JIT would most likely inline properties that return the vitals. Nice and generic so we'll get the same performance no matter what kinds of objects we are pushing and popping. To implement a stack, we'll actually use the linked arrays in reverse order. That way we can hold onto the bottom of the stack and the next pointer will actually point towards our previous a upper segment. In this way, we truly do grow downwards like a real stack, not that this matters very much. The only thing this really sucks at is ToArray, but being a personal collection, I'm not going to worry about that particular method. If I want to do a ToArray, I'd just return the collection in top down order in a larger array, rathe than reversing the entire thing.

public void Push( T value ) {
    if ( this.offset == 0 ) {
        this.growthSize <<= 1;
        this.stack = new LinkedArray<T>( growthSize, this.stack );
        this.offset = growthSize;
    }

    this.offset--;
    this.stack.Data[this.offset] = value;
}

We allocate at the top of the array so that we can do a delay allocation... Say we we just consumed the last element slot in our current array, but then we never add another element. By growing the array for data coming in, we ensure that we are going to use at least one element of the newly allocated segment. Pretty sweet. We do something similar for the Pop command, but what we want to prevent is the user popping the last element of a segment and then having us throw away the array. For that reason, during a Pop we position the offset right past the end of the used array. If we pop a second time, the array will be lost, but if we start to push elements again, we'll re-use that array and it won't be a loss.

public T Pop() {
    if ( this.offset == growthSize ) {
        this.stack = this.stack.Next;
        this.offset = 0;
        this.growthSize >>= 1;
    }

    return this.stack.Data[offset++];
}

That is only a partial implementation of a stack, but it is the most interesting part, and it took me a while to tune that code just right in order to outperform the actual Whidbey Stack implementation. I'm kind of assuming that they must be using linked arrays as well in order to perform at the same level as a linked array implementation. The most likely differences in the code would be in the ordering of elements in the array (I walk high->low they probably walk low->high) and the tuning metrics (I'm doing a *2 growth factor, they have better heuristics based on tuning by the performance team).

I did say Linked Array's now and later though didn't I? While I do battle with Whidbey, you can also make heavy use of linked array's today. Collections with random access potential don't tend to have the greatest use of linked arrays, but any time you can directly control which element the user has access to next, that is the algorithm that linked array's work well for. A basic stack implementation in V1.1 for intrinsic types is about 4 times faster than the equivalent collection classes and an object array based implemenation for storing any reference type objects is about 3 and a half times faster after you perform your casting on the way out. Hell, even boxing/unboxing integers into and out of the stack you wind up with a 3 times faster implementation. This just goes to show how much work has gone into these new generic collections, and for good reason if they are used throughout the rest of the BCL, ASP .NET, and Windows Forms.

Published Wednesday, September 29, 2004 11:45 PM by Justin Rogers

Comments

Thursday, September 30, 2004 5:57 PM by Scott Mitchell

# re: Performance: Linked Arrays's now and later

Justin, in your "Performance"-related entries, I couldn't find anywhere where you spell out your perf. testing methodologies you are using. (I know this entry doesn't have hard numbers, but earlier ones do). Did you post about this earlier (i.e., what a test case uses, how you do your timing, the testing computer's benchmarks, etc.) that I missed?

Thanks.
Thursday, September 30, 2004 6:51 PM by Justin Rogers

# re: Performance: Linked Arrays's now and later

Unfortunately, perf testing methodology on each of the entries I write would be quite long, so I try to focus on pushing or teaching a single item in each entry. In general my perf setup involves pre-allocating the testing structure so it doesn't affect the underlying algorithm times, testing all of the boundaries and extremes conditions outside of the initial performance structure to ensure consistency in the algorithms (sometimes this is absolutely exhaustive taking many hours), followed by any loop testings...

I've tested using high performance timers and the standard date-time timers... While the high perf timers work well they are nearly identical to the date-time timers once you've worked over a sufficient number of iterations.

My timings are not based on physical timing of the algorithm, but relative timings with respect to one another. To get these in a GC independent and accurate manner I run and re-run the tests for quite some time, averaging together results, and switching the order the algorithms are run in.

I often use a number of IL or source level changes that I've identified as affecting the JITer as well, and then make sure all of these different JIT time results are consistent.

I have posted on my benchmarking machine. It is a P4 2.8 ghz processor with a gig of RAM. This is definitely an optimized and fast machine, so I also test on a number of additional machines ranging from a PIII 500 with 128, PIII 800 with 256, P4 1.4 with 1 gig. A pretty good smattering of Intel machines I'd say.

I can more specifically detail these into a posting if people are more interested in how I test.
Friday, May 15, 2009 11:47 AM by nick_varcna

# re: Performance: Linked Arrays's now and later

Leave a Comment

(required) 
(required) 
(optional)
(required)