Performance inspiration found in List.Contains() method plus some additional functionality we can add.

While I was rewriting a more complete stack implementation I was using the actual generic stack as a guideline to make sure I implemented all of their methods and I came across their Contains implementation. There were some out of place operations in the code that would make it perform less than optimal, so I rewrote the code to get rid of the performance hit. That got me to thinking about the List implementation and whether or not the same performance problem was present there. Thankfully, the code there was written correctly. You can see what this would look like:

public class List<T> {

    private int _size;

    private T[] _items;

    public bool Contains( T item ) {

        if ( item == null ) {

            for ( int i = 0; i < this._size; i++ ) {

                if ( this._items[i] == null ) {

                    return true;

                }

            }

            return false;

        }

       

        IComparer<T> comparer = Comparer<T>.Default;

        for ( int i = 0; i < this._size; i++ ) {

            if ( comparer.Equals( item, this._items[i] ) ) {

                return true;

            }

        }

        return false;

    }

}

Notice that the null check loop is separate from the comparison loop. This drastically improves our performance when searching or a null item, and not only that, it takes an extra null check out of the comparer loop. For a comparison you'll have to take a look at the Stack<T>.Contains implementation:

public class Stack<T> {

    private int _size;

    private T[] _array;

 

    public bool Contains( T item ) {

        int offset = this._size;

        IComparer<T> comparer = Comparer<T>.Default;

        while ( offset-- > 0 ) {

            if ( item == null ) {

                if ( this._array[offset] == null ) {

                    return true;

                }

                continue;

            }

            if ( comparer.Equals( item, this._array[offset] ) ) {

                return true;

            }

        }

        return false;

    }

}

Every time through the loop we'll eat a null check now. This is much less performant than the List implementation. Even the List implementation has it's issues though. The Comparer<T> has a fairly deep code-path. For instance, it creates a GenericComparer based on the underlying types IComparable<T> interface. This in turn has very specific code for the Equals method. Namely, the GenericComparer does the damn null check again, followed by an Equals call... If you are using an Int32, this will get routed to Int32.Equals(Int32)... and finally gets turned into a basic equality (==) check.

That is one hell of a code-path, but it is 100% completely necessary. The unfortunality is that you can't apply the equality (==) operator when using generics meaning a Comparer<T> has to be used, else you need to rely on the boxing object.Equals method. In turn not every element is required to implement IComparable<T> either, so you might wind up with a GenericComparer (built based on the IComparable<T> interface) or an ObjectComparer. As you might guess if you wind up with an ObjectComparer, then you are just doing the boxing .Equals anyway.

The code-paths are built for a balance of functionality and performance, but you don't really get the best performance or best functionality. They weren't willing to sacrifice one for the other. There really is a second option. The possibility exists to add a new version of the Contains method that takes a custom comparison routine. They allow this for other methods, just not the Contains method. Even better, it would allow an Equality delegate... If you are looking for the maximum performance, then you want to reduce the code-path to the minimal number of instructions. If you are doing intetger equality you want to reduce the code-path to an equality (==) check, but that simply isn't possible, so you have to eat at least one level of indirection. An equality delegate would do just what we are looking for and anonymous methods would be the best way to use it.

List<int> foo = new List<int>(); foo.Add(10); foo.Add(20);
Console.WriteLine(foo.Contains(10, (Equality<int>) delegate(int x, int y) { return x == y; }));
Console.WriteLine(foo.Contains((TrickEquality<int>) delegate(int y) { return y == 10; }));

In the future we may see some architectural changes take place. Generic types are closed meaning they are built for each set of type parameters passed to them. It seems to me that various closure time conditions could be made to select between a set of different options based on the type specified in the type parameters. For instance, it can be determined at closure whether or not certain methods are available, and if they are, then select them. Coding a collection would be a matter of selecting the methods you'd want in order of some precedence, falling back to a method available across the board. Other options would be the ability to constrain based on operators (something that is heavily requested, but not available in Whidbey, maybe later) even though that would limit a collection from performing at the most generic levels.

Closure conditions would be frigin awesome. Today we implement these with run-time reflection based switches. That is how you get the option between GenericComparer and ObjectComparer. Making these even more efficient is difficult to do. A really good JIT could inline and compress the entire set of operations if they were simpler, but the verbosity of the null checks in the methods would mean a generic optimization would miss this and only a specific optimization would actually find it.

Published Thursday, October 07, 2004 3:41 AM by Justin Rogers

Comments

Thursday, October 07, 2004 10:50 AM by Sahil Malik

# re: Performance inspiration found in List.Contains() method plus some additional functionality we can add.

Awesome .. and since I was reading this blog thru an aggregator, I had no clue of your background. I might not be too off in assuming that you work for microsoft, probably writing parts of the .NET framework.

One thing that these RSS feeds should have is a little 50 word author profile that is easily visible with every post. Like in your case "Senior Software Design Engineer and Consultant" -- didn't show up in my aggregator (saucereader).

Anyway, I just wish there was this tool around to help me filter my content better.
Thursday, October 07, 2004 10:52 AM by Sahil Malik

# re: Performance inspiration found in List.Contains() method plus some additional functionality we can add.

Incidentally, I am so thrilled about generics; I posted a message on my blog last night about what generics imply - which is terribly understated I feel. People with a good C++ background would probably understand that generics let you do what you couldn't do without them ... unlike most other technologies that are simply a better way of doing something you could already do. (CTE's in Yukon for example)
Thursday, October 07, 2004 2:32 PM by Justin Rogers

# re: Performance inspiration found in List.Contains() method plus some additional functionality we can add.

I've often worked with MS on various projects. I always keep a resume link in the News section of my blog in the upper left, again, something that is missed if you read through an aggregator (quite a shame). I wish my author would aggregate with my title. I do my work through my own company, DigiTec Web Consultants, LLC. and I'm always looking for new and interesting contracts. Most people tend to not realize that.

Leave a Comment

(required) 
(required) 
(optional)
(required)