Performance: Refactoring generic List.RemoveAll... Speed lost in implementation and Array.Clear...
Well, I got railed after my previous posting on the List.RemoveAll method How is that RemoveAll method on the generic List actually implemented?. Apparently I'm not allowed to make basic comments anymore, and I quote "Now your remark is just information that just is not correct and of no interest for anybody"... Ouch man, that really hurts. Actually I could care less, but I don't want to scare off my "advanced" readers...
After looking at RemoveAll, there are two things we can fix. Apparently there is a nested while loop structure. Dan pointed this out after fixing up some code from .NET Reflector. The code would look something like the following:
while ( toffset < count ) {
while ( toffset < count && match( data[toffset] ) ) {
toffset++;
}
if ( toffset < count ) {
data[hoffset++] = data[toffset++];
}
}
If we follow this closely, we can see that some extra checks are being made. The first time in the outer loop we compare toffset < count twice... Every time this loop continues we do the same thing. This results in some extra comparisons. The next problem is that upon completing the inner loop we fall out and again compare toffset < count an extra and unncessary time. I can't account for every processor, so maybe there are some benefits to this method in certain JIT scenarios that I'm missing? If we refactor that loop we can get some minor performance improvements.
while ( toffset < count ) {
if ( match( data[toffset] ) ) {
toffset++;
} else {
data[hoffset++] = data[toffset++];
}
}
The above loop removes the extra checks. This doesn't result in much of a performance improvement, but it is slight. We can jump to the bottom of the method and find another performance improvement. Right now there is a not-so-costly clear being performed. Defining not-so-costly is more difficult than you might think. We can actually get an extra 5-6% improvement in performance by removing it.
Array.Clear( data, hoffset, count - hoffset );
int length = count - hoffset;
count = hoffset;
version++;
return length;
That is what Reflector gives you for the end of the method. We create an extra stack local (length), and we perform the length subtraction twice... I honestly think doing it that way is faster than storing the length first and then reusing it (because of parallelism). I propose removing the Array.Clear and stack local.
count = hoffset;
version++;
return toffset - hoffset;
Little bit of food for though... The speed timings on RemoveAll vs RemoveAllFast are shown. I've made every attempt to isolate the actual method timings here, including pre-allocating the entire test structure, forcing a JIT of all of the methods outside of the loop timing, etc...
RemoveAllFast: 00:00:00.6309072, 0
RemoveAll: 00:00:00.7310512, 0