Since I'm on collections, strongly typed arrays, versus List<T> for very large collections... I want a Decompose method...

Collections, are collection, are collections.  NOT!  Collections are arrays somewhere, at least in most cases.  More often than not I find myself using an ArrayList (in V1/1.1) and then converting the said list to a strongly typed array for quickly running over the values.  The conversion is fine for smaller arrays, but for larger arrays the conversion really sucks, and it takes up twice the memory.  After all, don't you already have a collection back there somewhere?

Well, for ArrayList it makes sense they made us do all the copying since the backing array was of type System.Object.  But now we have List<T>.  The array backing that is already strongly typed and ready to go.  Hell, shouldn't I be able to get a reference directly to the already built array for use?  Sure it may have a couple of extra elements that I don't need, but it is up to me to handle the esoterics of indexing.  What is currently available and what am I looking for?

There is already a CopyTo method that I could use.  However, this is exactly what it says, a copy.  There is also a ToArray, but that creates a copy of the array.  You could say I should just use the indexer on List<T> since it is now strongly typed, however, get_Item() has a bunch of crappy range checking logic that is going to slow me down just that little bit.  What I'm looking for is a Decompose method.  Reset the internal state of the collection to a new array, and return me the reference to the old array.  Of course your ReadOnlyList would throw an exception on this, as it should, but your normal List<T> should run just fine.  After all, the user could always just call Clear, so it isn't like the internal array you are using is sacrosant in any way to warrant the protection of it.

Any takers?  Anyone think that direct access to the underlying array without copying for things like Stack<T>, Queue<T>, and List<T> would help out any?

Published Tuesday, April 27, 2004 5:21 AM by Justin Rogers

Comments

Tuesday, April 27, 2004 10:13 AM by Daniel Jin

# re: Since I'm on collections, strongly typed arrays, versus List<T> for very large collections... I want a Decompose method...

I've learned that optimization is actually performed to JIT away the bound checks. For example, in the following code

for( int i = 0; i < array.Length; i++ )
{
// do something with array[i]
}

JIT compiler does a pattern recognition on that and JIT away the bound check typically occurs on array[i]. I don't know how ArrayList is handled since there's a bound check on ArrayList indexer in addition to the underlying array. but I've learned now to trust that JITer will (or at least eventually will) do the right thing. as an outsider, you never know what JIT is designed to do, some manual optimization could actually be counter productive and hurt the performance.

about stack, queue, and list. theoratically, they should all be linked list based, not array based. I sure hope the queue is not implemented with an array, otherwise popping off the head is going to be really costly on every call.
Tuesday, April 27, 2004 10:17 AM by Raymond Chen

# re: Since I'm on collections, strongly typed arrays, versus List<T> for very large collections... I want a Decompose method...

If access to the internal array were made public, then that would require List<T> (for example) to use an array forevermore. Do you really want to lock in the internals of List<T>? Suppose that somebody discovers that linked arrays are faster than a single array. Do you want to prevent them from using that new faster data structure?
Tuesday, April 27, 2004 2:41 PM by Drew Marsh

# re: Since I'm on collections, strongly typed arrays, versus List<T> for very large collections... I want a Decompose method...

I agree with Raymond and would point out that if your Decompose method returned the actual array, then it would have empty slots. Remember the array is grown dynamically, but it is grown according to an algorithm that will (hopefully) reduce the number of times that it must be grown. If you've added 75 items to the List<T> and it decided that it had to grow a few times, the internal array might actually be 200 items. How would you know where to stop if you got your hands on that array? Length would be 200, so you can't just go by that. If you're storing a reference type, sure you could stop when you find a null, but what if you're storing a value type such as Int32? Would you stop when you find Int32.default? Well what if 0 (which is Int32's default) is a valid value??
Tuesday, April 27, 2004 5:43 PM by Justin Rogers

# re: Since I'm on collections, strongly typed arrays, versus List<T> for very large collections... I want a Decompose method...

Drew: As mentioned in my post, the esoterics of length are up to you. The intelligent Decompose could have an out parameter that stores your length for you, or you could store the length before you do a Decompose (doing this is more dangerous since the length can change).

Raymond: Very interesting to point out the performance there. Of course a linked array would be exceptionally slower on indexing (one extra level of indirection), on CopyTo, marginally slower on enumeration. So yes, I'd love to lock them into using a single array. The only thing linked arrays does fix is the grow factor. So rather than guesstimate that they might make List<T> faster, go ahead and make a LinkedArrayList<T> that uses the alternate storage method and depending on the user's scenario, they'll make use of the appropriate class. Don't take away my options by limiting me.

Daniel Jin: I guess Joshua Allen was correct. The JIT optimization is for array's because Length can't change, not for collections where Count (not Length) can change behind the scenes). You can't just JIT optimize away all bounds checks my friend, else they wouldn't have put them there in the first place.

As for Stack, it is best implemented using an array, since it grows from index 0 up. As for Queue, it is also best implemented using an array with head and foot pointer indices into the array. They may find it more appropriate to use a linked list for this guy in the future. So maybe, just maybe this one could be taken off the list
Tuesday, April 27, 2004 7:14 PM by Daniel Jin

# re: Since I'm on collections, strongly typed arrays, versus List<T> for very large collections... I want a Decompose method...

> You can't just JIT optimize away all bounds checks my friend

you are correct. I meant to say *sometimes* bound check can be JITted away.

> As for Stack, it is best implemented using an array, since it grows from index 0 up ...

Raymond definitely made a more convincing argument on that.

Leave a Comment

(required) 
(required) 
(optional)
(required)