How is that RemoveAll method on the generic List actually implemented?

We get lots of removal methods for the List<T> in Whidbey. You can Remove(T), or the first instance of some object or value. You an RemoveAt(int) and take something out of a given offset. You can even RemoveRange(int, int) and take out a whole group of elements. This last method probably takes performance into consideration in an interesting way to copy the end of the array over top of the removal range and decrement the count accordingly.

RemoveAll(Predicate<T>) is completely new though, and only possible with Whidbey. These new predicates are definitely fun and I've gone over them a few times. With respect to RemoveAll, there are some fairly interesting performance options and we just don't know how complicated the internal algorithm really is. Let's take a look at a possible array and a sample predicate:

int[] myElements = new int[] { 1, 1, 2, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2 };
private static bool EvenPredicate(int value) { return (value%2==0); }

As we walk through the above array, we'll return the following {F,F,T,F,F,T,T,T,F,F,T,F,T}. For all false entries, the list will remove that element and compress the list back down. By the time we run our predicate we should be left with only odd numbers, in our case 1's. Compressing as we remove each item could really damage our performance, so what would be ideal is storing a match and then waiting for the next match before we decide to compress the list down. In this manner we'll only move elements as needed. But is that what RemoveAll is doing?

You can draw up some basic tests to try and find out. What would a poor RemoveAll look like if we implemented our own generic collection? Well, it would probably do that copy every time through just like we talked about.

public void RemoveAll( Predicate<T> predicate ) {
    for ( int i = 0; i < count; ) {
        if ( predicate( data[i] ) ) {
            count--;
            Array.Copy( data, i + 1, data, i, count - i );
        } else {
            i++;
        }
    }
}

That copy each time will kill you. It is pretty obvious the BCL is doing the right thing because the version you see above is more than twice as slow as the actual RemoveAll implementation. I get all warm and fuzzy when the native methods work so well. In the next step, we'll have to try and simulate the actual RemoveAll method and see if we can get similar timings. I'm actually worried about this step because I don't think it will be possible even with the same algorithm. Why? Well, depending on where they implement their call pattern of the delegate, they may have some really great optimizations that they can make. If that is the case then I'll have to pick up some of the same tricks because ;-)

Published Friday, October 01, 2004 3:05 AM by Justin Rogers

Comments

Friday, October 01, 2004 7:41 AM by Ramon Smits

# re: How is that RemoveAll method on the generic List actually implemented?

Your version seems to use boundschecking.. And in your example it is very obvious that Array.Copy will generate a performance hit thus it would be very strange if the framework team would decide to use this implementation. As you say yourself "a poor RemoveAll" implementation..

A much better implementation is to use the internal array and shift elements. After that just create a new array with the size of the element count that matched the predicate.

So think about a fast easy to implement design and implement it and THEN analyse the performance. If that implementation is much slower, than you can say "I get all warm and fuzzy when the native methods work so well."

Now your remark is just information that just is not correct and of no interest for anybody.

Just my 2 cents..
Friday, October 01, 2004 10:01 AM by Dan Golick

# re: How is that RemoveAll method on the generic List actually implemented?

The generic version moves non-matching items one at a time so that it coallesces all remaining matching items at the end of the array and then clears them:

public int RemoveAll(Predicate<T> match)
{
if (match == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
}
int num1 = 0;
while ((num1 < this._size) && !match(this._items[num1]))
{
num1++;
}
if (num1 >= this._size)
{
return 0;
}
int num2 = num1+1;

while(num2 < this.size)
{
while ((num2 < this._size) && match(this._items[num2]))
{
num2++;
}
if (num2 < this._size)
{
this._items[num1++] = this._items[num2++];
}
}

Array.Clear(this._items, num1, this._size - num1);
int num3 = this._size - num1;
this._size = num1;
this._version++;
return num3;
}
Friday, October 01, 2004 5:57 PM by Justin Rogers

# re: How is that RemoveAll method on the generic List actually implemented?

Ramon: My information provides an upper bound, as mentioned a naive implementation, and as such is definitely useful in analyzing the algorithm at hand. It shows that in this case they didn't use the slow Array.Copy, however, they use it consistently throughout the rest of the Framework in situations where they have more performant options. It is clear to me, from studying their code, that they might make this mistake. We proved they didn't make the mistake and we are all the better for it.

Dan: Hey, you even fixed up the Reflector code a bit ;-) Not many opportunities for speed improvements there. Removing the Clear would be nice for sure. You can remove 5 or 6 ops from the code with just a basic comb-through. Thanks for your input.
Friday, October 01, 2004 8:23 PM by Dan Golick

# re: How is that RemoveAll method on the generic List actually implemented?

I had to fix the reflector code because I couldn't stand the gotos!

My experience is that the clear is pretty damn fast as it eventually becomes a fast
rep stosl
where the whole fill is done in 1 (many cycle) machine instruction.
Friday, October 01, 2004 8:28 PM by Justin Rogers

# re: How is that RemoveAll method on the generic List actually implemented?

Well, I'm certainly happy you fixed it up a bit. After comparing the exact RemoveAll algorithm, we really are able to remove some extra operations, even if we leave in the Array.Clear...

I agree that both memory copy and clearing operations tend to be extremely fast and generally we don't have to worry about them too much. My only concern is needed versus unneeded. They do similar *clearing* in other functions like Pop and Dequeue. This changes the paradigm of:

return data[current--];
to
int value = data[current];
data[current--] = T.default;
return value;

The need for this clearing is insidious, because you can't honestly do it in one place and not do it in all others. You can see that it has much more impact when working on items a single item at a time. Almost like running a C++ program in debug mode and having it clear all of your memory for you ;-)

Leave a Comment

(required) 
(required) 
(optional)
(required)