How is that RemoveAll method on the generic List actually implemented?
We get lots of removal methods for the List<T> in Whidbey. You can Remove(T), or the first instance of some object or value. You an RemoveAt(int) and take something out of a given offset. You can even RemoveRange(int, int) and take out a whole group of elements. This last method probably takes performance into consideration in an interesting way to copy the end of the array over top of the removal range and decrement the count accordingly.
RemoveAll(Predicate<T>) is completely new though, and only possible with Whidbey. These new predicates are definitely fun and I've gone over them a few times. With respect to RemoveAll, there are some fairly interesting performance options and we just don't know how complicated the internal algorithm really is. Let's take a look at a possible array and a sample predicate:
int[] myElements = new int[] { 1, 1, 2, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2 };
private static bool EvenPredicate(int value) { return (value%2==0); }
As we walk through the above array, we'll return the following {F,F,T,F,F,T,T,T,F,F,T,F,T}. For all false entries, the list will remove that element and compress the list back down. By the time we run our predicate we should be left with only odd numbers, in our case 1's. Compressing as we remove each item could really damage our performance, so what would be ideal is storing a match and then waiting for the next match before we decide to compress the list down. In this manner we'll only move elements as needed. But is that what RemoveAll is doing?
You can draw up some basic tests to try and find out. What would a poor RemoveAll look like if we implemented our own generic collection? Well, it would probably do that copy every time through just like we talked about.
public void RemoveAll( Predicate<T> predicate ) {
for ( int i = 0; i < count; ) {
if ( predicate( data[i] ) ) {
count--;
Array.Copy( data, i + 1, data, i, count - i );
} else {
i++;
}
}
}
That copy each time will kill you. It is pretty obvious the BCL is doing the right thing because the version you see above is more than twice as slow as the actual RemoveAll implementation. I get all warm and fuzzy when the native methods work so well. In the next step, we'll have to try and simulate the actual RemoveAll method and see if we can get similar timings. I'm actually worried about this step because I don't think it will be possible even with the same algorithm. Why? Well, depending on where they implement their call pattern of the delegate, they may have some really great optimizations that they can make. If that is the case then I'll have to pick up some of the same tricks because ;-)