Collections in .Net

Thursday, January 14, 2010

Undoubtedly anyone who has ever programmed with any computer programming language has one way or the other has used some data structure , class etc to collect similar kind of data in one place.For example in C/C++ in order to handle multiple data of similar kind we have used Arrays.

As programming languages evolved they also brought in various options of doing the same in more effective manner by offering developers with more number of data structures.In .Net framework these data structure are collectively known as Collection classes.

In this article my aim to to provide an overview of how collection classes are organized in .Net framework.I will not be providing any code snippet or example on how to use List<T> or Dictionary<TKey,TValue> etc because we have used and have also come across plenty of code wherein these constructs are used.What I will be trying is to provide on how these collection classes are organized and how they have evolved with one version of .Net framework to another.

C# 1 : ArrayList and HashTable

The two most prominent collection classes available during C# 1.0 era or say during .Net 1.1 Framework where the ArrayList and the HashTable classes.These classes where included in the namespace :

namespace System.Collections

I hope that people are comfortable with the programming concept of interfaces and how they are used in Object Oriented Programming.Reason for this is that these classes(ArrayList and HashTable) implemented different interfaces which in turn contained different functionalities but their actual implementation varied from one class to another.

ArrayList

Interface’s Implemented by ArrayList

public class ArrayList : IList, ICollection, IEnumerable, ICloneable

Following screenshot highlights above declaration :

Note carefully that in above diagram the “ArrayList” class implements the Interfaces and inherits the “Object” class.Among all these interfaces its the “IList” interface which adds the core functionality to the ArrayList class(please don’t underestimate other interfaces,its just that I can’t cover them in this article).Following is the internals of “IList” interface as defined in the System.Collection namespace.

IList Interface

public interface IList : ICollection, IEnumerable
{
    // Methods
    int Add(object value);
    void Clear();
    bool Contains(object value);
    int IndexOf(object value);
    void Insert(int index, object value);
    void Remove(object value);
    void RemoveAt(int index);

    // Properties
    bool IsFixedSize { get; }
    bool IsReadOnly { get; }
    object this[int index] { get; set; }
}

Class diagram for IList interface

So its through the implementation of “IList” interface that ArrayList class gets functionalities like Add , Remove , Contains , RemoveAt etc.

Interesting thing to note here is the declaration for IList members like Add() , Remove() , Contains(),Insert() and the indexer i.e.

object this[int index] { get; set; }

All these members either take a parameter of type “Object” or return a parameter of type “Object”.Now consider the following code snippet :

static void Main(string[] args)
{
    ArrayList list = new ArrayList();
    for (int i = 0; i < 10; i++)
          list.Add(i);
    foreach (Int32 item in list)
          Console.WriteLine(item);
    Console.ReadLine();
}

In the above program I am adding 10 items to the ArrayList and finally I am printing then on console.Now ArrayList expects a parameter of type “Object” or to be more specific it expects a reference type variable.But on the other hand what we are passing in above code is an instance of type Int32 which is actually a value type.In this case compiler has to perform some extra task which is commonly known as “Boxing”.In simple terms boxing occurs when a value type is converted into a reference type and its reverse is known as unboxing.Following is the IL(Intermediate Language) code generated by compiler for the above code snippet.I have removed most of the section for brevity purpose and only highlighted the portion where boxing and unboxing occurs.Unboxing occurs whenever we access an item in ArrayList in foreach loop.

.method private hidebysig static void  Main(string[] args) cil managed
{
    // .... more code .....
    IL_000d:  box        [mscorlib]System.Int32
    IL_0012:  callvirt   instance int32 [mscorlib]System.Collections.ArrayList::Add(object)
    
    // .... more code .....
    IL_0032:  callvirt   instance object [mscorlib]System.Collections.IEnumerator::get_Current()
    IL_0037:  unbox.any  [mscorlib]System.Int32
    // .... more code 
  
} // end of method Program::Main

HashTable

HashTable provides the functionality of storing values in a collection wherein each value is having its corresponding key.In essence HashTable adds item to collection in key-Value pair format.In HashTable you cannot have two items in collection with same key.HashTable implements the following interfaces.

public class Hashtable : IDictionary, ICollection, IEnumerable, ISerializable, IDeserializationCallback, ICloneable

Following screenshot highlights above declaration :

What IList was for ArrayList , so is “IDictionary” for HashTable collection class.Following is the internals of IDictionary interface as defined in the System.Collection namespace.

IDictionary Interface

public interface IDictionary : ICollection, IEnumerable
{
    // Methods
    void Add(object key, object value);
    void Clear();
    bool Contains(object key);
    IDictionaryEnumerator GetEnumerator();
    void Remove(object key);

    // Properties
    bool IsFixedSize { get; }
    bool IsReadOnly { get; }
    object this[object key] { get; set; }
    ICollection Keys { get; }
    ICollection Values { get; }
}

Again , just like ArrayList class the IDictionary interface which is implemented by HashTable contains member functions , indexers etc which either accept or return variable of type “Object”.And thus HashTable just like ArrayList also suffers from the overhead of boxing/unboxing when used to store/retrieve value type variables.

The major problem with collection classes in .Net Framework 1.0/1.1 where : type safety and performance.Type safety because the collection classes expected “Object” type and thus any type can be passed as parameter and the code will compile but will throw a run time exception or if not exception then some weird result. Performance issue was because of the boxing and unboxing operation which compiler has to perform for value types.And the solution for these issues was provided in next version of .Net Framework 2.0 with the introduction of Generics.From then on the place held by ArrayList and HashTable was taken over by List<T> and Dictionary<TKey,TValue> respectively which we are going to see in next section.

C# 2 : Introduction of Generic Collection Classes ( List<T> and Dictionary<TKey,TValue>)

These generic collection classes , interfaces which they implemented and other helper classes where collectively put under entirely new namespace :

System.Collections.Generic;

Getting into the details of benefit of using generic collection classes is beyond the scope of this article.But very quickly I will list down the key benefits of generic collection classes :

Type Safety
Cleaner Code
Better Performance

List<T>

Interface implemented by List<T>

public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable

Screenshot for the above declaration :

In the List<T> declaration the “T” stands for “Type Parameter”.Before using the List<T> collection class , we need to specify the type parameter which can be an existing type(string , Int32 etc ) or our very own custom type(Person , Student etc ).But once the type is specified , then compiler will ensure that you are not adding any item which is not of the declared type.For e.g. if you declare List<Int32> , then we cannot add item of type “string” to collection.There by compiler ensures that there is no runtime exception and code is verified during compilation itself.This is what is known as “Type Safety” and by far is the biggest benefit added by generic collection classes to .Net Framework.

When working with value types , there is no overhead of boxing and unboxing value types while adding or retrieving them from collection.The compiler emits the proper IL code which highlights as if the collection is intended for working with value types.Consider the following example :-

class Program
    {
        static void Main(string[] args)
        {
            List<;Int32> list = new List<int>();
            list.Add(23);
            list.Add(34);
            list.Add(25);

            foreach (int num in list)
                Console.WriteLine(num);
            Console.ReadLine();
        }
    }

Following is the IL code generated by compiler.Again , I have removed most of the parts of the IL code for brevity purpose.

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       95 (0x5f)
  .maxstack  2
  .locals init ([0] class [mscorlib]System.Collections.Generic.List`1<int32> list,
           [1] int32 num,
           [2] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32> CS$5$0000,
           [3] bool CS$4$0001)
  IL_0000:  nop
  IL_0001:  newobj     instance void class [mscorlib]System.Collections.Generic.List`1<int32>::.ctor()
  IL_0006:  stloc.0
  IL_0007:  ldloc.0
  IL_0008:  ldc.i4.s   23
  IL_000a:  callvirt   instance void class [mscorlib]System.Collections.Generic.List`1<int32>::Add(!0)
  .................................................
 ..................................
  .........................
} // end of method Program::Main

As you can see , list is in fact of type “Int32”.If the amount of data to be added/fetched from collection is high then using generic collection classes can be easily 10-20 times much faster than regular non generic collection classes.

Speaking about IL code , you can view the Il code for any compiler code by using the tool called ILDASM.In order to open ILDASM , follow the given path:start –> Visual Studio 2005/2008/2010 –> Visual Studio Tools –> Visual Studio Command Prompt . Once the command prompt opens up , type “ildasm” and the corresponding tool will open up.In order to view the IL code for any dll/exe , just drag and drop it over the ILDASM window.

Lets very quickly have a look at the internals of the IList<T> interface implemented by the List<T> class.

IList<T> Interface

public interface IList<T> : ICollection<T>, IEnumerable<T>, IEnumerable
{
    // Methods
    int IndexOf(T item);
    void Insert(int index, T item);
    void RemoveAt(int index);

    // Properties
    T this[int index] { get; set; }
}

If you have noticed List<T> or IList<T> interface , then they implement both the generic and non generic versions of certain interfaces.For eg IList<T> inherit both IEnumerable<T> and IEnumerable interface.The reason for this is backward compatibility.And the consequence of this is that if you are creating a custom type that inherit from say IList<T> interface then you need to provide implementation for both the generic and non generic version of the IEnumerable interface.

But don’t worry , with generic collection classes in place the need to defining your own custom collection classes is very rare.But still if by any chance you need more options then try the Wintellects PowerCollection Library.

Dictionary<TKey , TValue>

There is not much left to say about Dictionary<TKey,TValue> collection class.Again here TKey and TValue are the type parameter for the key and value pair for Dictionary.And once the type for both key and value is specified , compiler will ensure that you are not adding either key or value whose type is mismatched with the previously defined types.Dictionary<TKey,TValue> share the same namespace with List<T> class.

Interface Implementation by Dictionary<TKey,TValue>

public class Dictionary<TKey, TValue> : IDictionary<TKey, TValue>, ICollection<KeyValuePair<TKey, TValue>>, IEnumerable<KeyValuePair<TKey, TValue>>, IDictionary, ICollection, IEnumerable, ISerializable, IDeserializationCallback

Screenshot for the above declaration

Summary

Through out the article rather than concentrating on how to use ArrayList , List<T> , HashTable etc but I was more concerned on what are the interfaces implemented by these collection classes.Their usage is very well documented and explained in numerous articles and blog but highlighting what interfaces they implement provide following two important benefits.

Understanding of collection classes in .Net framework and over all class hierarchy is enhanced.So next time if you are reading some standard text , and if it uses IList<T> , through out the context then you are sure that what ever is said is applicable to List<T> class and to any class that implements IList<T> interface.In fact books which I refer for .Net programming , their authors always speak in terms of interfaces and abstract classes.The fundamental principle behind this is “programming to supertype” or “programming to interface”.You can read more on this in the following article “Program to an interface, not an implementation”.
Method’s Parameter Type : When defining methods parameter type , its always better to specify the weakest type i.e. prefer interface over base classes.For example if you have a method that manipulates a collection then its better to specify the parameter type as IEnumerable<T> , rather than specifying concrete class like List<T>.Consider the following code:

// Desirable
private void Print(IEnumerable<string> items);

//Undesirable
private void Print(List<string> items);

Reason first method can be called by passing object that implements the IEnumerable<T> interface like Dictionary<TKey,TValue>,HashSet<T> , LinkedList<T> etc.Anyways if you are writing a method which expects a list then you should declare the parameter type as IList<T> rather than specifying List<T>.

Good Article and is very informative and clear.
Just have few questions:
1) Since collections are most usable for every develop so please mention if there are any more changes done in collections objects after framework 2.0.

2) Why both Generic and non-generic interfaces have been used while implementing the IList interface.(I think we could have used generic only)

Please answer these doubts whenever you get chance.
Thanks,
Anil

Anil - Monday, March 29, 2010 10:17:27 PM

nice article..very clear

srinivas - Friday, May 7, 2010 11:05:20 AM

very good article.I didn't any other site explains this good.

praveen - Thursday, February 17, 2011 5:22:02 AM

So If you want to examine an item in a generic list to determine whether it is valid or to compare it
to some other item, the compiler must have some guarantee that the operator or method it has to call will
be supported by any type argument that might be specified by client code. This guarantee is obtained by
applying one or more constraints to your generic class definition.

There are three types of constraints - Friday, March 2, 2012 9:40:07 AM

Very nice article ..Thanks alot

Divya - Monday, July 2, 2012 5:58:55 AM

a very informative article

priyanka - Tuesday, September 25, 2012 5:26:00 PM

Collections in .Net

Contact Me

6 Comments