[.NET General] John Gough on "Vectors vs. Arrays"

Tuesday, August 31, 2004

Below, I present a very interesting post on "Vectors vs. Arrays" where John Gough describes some important details about what goes on under the covers of certain seemingly minor differences in array notation. John's post is contained in the archives of the DOTNET-LANGUAGE-DEVS email list on DISCUSS.MICROSOFT.COM but is only available, there, to list members. I think this post is of more general interest so am making it accessible here.

A lengthier explanation is available in John's book, Compiling for the .NET Common Language Runtime.

Among other things, John Gough is the creator of Gardens Point Component Pascal (gpcp), an Open Source compiler for an object-oriented dialect of Pascal that runs on the .NET Framework,

NOTE: The following contains quoted material.

Rod da Silva's Question:

I was under the impression that there is a very real difference between
the CLR type int[][] and int[,]. However, I am finding out that the
both appear to be nothing more than instances of System.Array class.
That is, they both exhibit pass-by-reference semantics in that I can
pass either to a method and modify one of its elements, and the change
will persist when I return from the method. I was expecting int[][] to
have pass-by-value semantics.

Can someone please describe the difference between int[][] and int[,]?
Also is there any way to make int[][] have pass-by-value (i.e.;
valuetype) semantics?

John Gough's Answer:

good question.

You are correct, both int[][] and int[,] are reference types.
I spend some time in my "Compiling for the .NET Common
Language Runtime" (Prentice Hall 2002) explaining what a
compiler has to do to get value semantics for its target
language.

The difference between the two types can be understood as
follows. One dimensional arrays of any type are a primitive
for the CLR. Thus int[] is a <<reference>> to an array of
int. The type int[][] is a reference to an array of
references to int. It is thus a "ragged array", and if you
want it to be normal two-D array then in CIL the initializer
must explicitly create each component int[] array to be the
same length. Of course in some languages the compiler may
hide this away from the programmer. Note that it follows
that creating an array, say int[8][8], will require a total
of nine(!) objects to be allocated.

The type int[,] is not a built-in type of the execution
engine, although the JIT does need to know about it.
Instead it is one of the possible forms of System.Array. In
brief, the memory allocated for such an array will be in one
glob, and requires just one object creation. The only
downside is that you cannot access the elements of such an
array using just the raw instruction set of the CLR. It is
necessary to call functions of System.Array and hope that
the JIT gets to be clever enough to inline the code.

Finally, how to get value semantics. Reading my book may
help you write a compiler to do the trick, but if you are
stuck with a language that does not do it for you then you
need to write a method for each type, such as

 int[][] CopyOf(int[][] x) {
 // allocate correctly sized collection of 1-D arrays
 // now copy the elements, then return
 }

So that instead of saying
 SomeMethod(myArray);
you go
 SomeMethod(CopyOf(myArray));

Hope this helps.

John Gough

No Comments