Collections in .Net
Undoubtedly anyone who has ever programmed with any computer programming language has one way or the other has used some data structure , class etc to collect similar kind of data in one place.For example in C/C++ in order to handle multiple data of similar kind we have used Arrays.
As programming languages evolved they also brought in various options of doing the same in more effective manner by offering developers with more number of data structures.In .Net framework these data structure are collectively known as Collection classes.
In this article my aim to to provide an overview of how collection classes are organized in .Net framework.I will not be providing any code snippet or example on how to use List<T> or Dictionary<TKey,TValue> etc because we have used and have also come across plenty of code wherein these constructs are used.What I will be trying is to provide on how these collection classes are organized and how they have evolved with one version of .Net framework to another.
C# 1 : ArrayList and HashTable
The two most prominent collection classes available during C# 1.0 era or say during .Net 1.1 Framework where the ArrayList and the HashTable classes.These classes where included in the namespace :
namespace System.Collections
I hope that people are comfortable with the programming concept of interfaces and how they are used in Object Oriented Programming.Reason for this is that these classes(ArrayList and HashTable) implemented different interfaces which in turn contained different functionalities but their actual implementation varied from one class to another.
ArrayList
Interface’s Implemented by ArrayList
public class ArrayList : IList, ICollection, IEnumerable, ICloneable
Following screenshot highlights above declaration :
Note carefully that in above diagram the “ArrayList” class implements the Interfaces and inherits the “Object” class.Among all these interfaces its the “IList” interface which adds the core functionality to the ArrayList class(please don’t underestimate other interfaces,its just that I can’t cover them in this article).Following is the internals of “IList” interface as defined in the System.Collection namespace.
IList Interface
public interface IList : ICollection, IEnumerable
{
// Methods
int Add(object value);
void Clear();
bool Contains(object value);
int IndexOf(object value);
void Insert(int index, object value);
void Remove(object value);
void RemoveAt(int index);
// Properties
bool IsFixedSize { get; }
bool IsReadOnly { get; }
object this[int index] { get; set; }
}
Class diagram for IList interface
So its through the implementation of “IList” interface that ArrayList class gets functionalities like Add , Remove , Contains , RemoveAt etc.
Interesting thing to note here is the declaration for IList members like Add() , Remove() , Contains(),Insert() and the indexer i.e.
object this[int index] { get; set; }
All these members either take a parameter of type “Object” or return a parameter of type “Object”.Now consider the following code snippet :
static void Main(string[] args)
{
ArrayList list = new ArrayList();
for (int i = 0; i < 10; i++)
list.Add(i);
foreach (Int32 item in list)
Console.WriteLine(item);
Console.ReadLine();
}
In the above program I am adding 10 items to the ArrayList and finally I am printing then on console.Now ArrayList expects a parameter of type “Object” or to be more specific it expects a reference type variable.But on the other hand what we are passing in above code is an instance of type Int32 which is actually a value type.In this case compiler has to perform some extra task which is commonly known as “Boxing”.In simple terms boxing occurs when a value type is converted into a reference type and its reverse is known as unboxing.Following is the IL(Intermediate Language) code generated by compiler for the above code snippet.I have removed most of the section for brevity purpose and only highlighted the portion where boxing and unboxing occurs.Unboxing occurs whenever we access an item in ArrayList in foreach loop.
.method private hidebysig static void Main(string[] args) cil managed
{
// .... more code .....
IL_000d: box [mscorlib]System.Int32
IL_0012: callvirt instance int32 [mscorlib]System.Collections.ArrayList::Add(object)
// .... more code .....
IL_0032: callvirt instance object [mscorlib]System.Collections.IEnumerator::get_Current()
IL_0037: unbox.any [mscorlib]System.Int32
// .... more code
} // end of method Program::Main
HashTable
HashTable provides the functionality of storing values in a collection wherein each value is having its corresponding key.In essence HashTable adds item to collection in key-Value pair format.In HashTable you cannot have two items in collection with same key.HashTable implements the following interfaces.
public class Hashtable : IDictionary, ICollection, IEnumerable, ISerializable, IDeserializationCallback, ICloneable
Following screenshot highlights above declaration :
What IList was for ArrayList , so is “IDictionary” for HashTable collection class.Following is the internals of IDictionary interface as defined in the System.Collection namespace.
IDictionary Interface
public interface IDictionary : ICollection, IEnumerable
{
// Methods
void Add(object key, object value);
void Clear();
bool Contains(object key);
IDictionaryEnumerator GetEnumerator();
void Remove(object key);
// Properties
bool IsFixedSize { get; }
bool IsReadOnly { get; }
object this[object key] { get; set; }
ICollection Keys { get; }
ICollection Values { get; }
}
Again , just like ArrayList class the IDictionary interface which is implemented by HashTable contains member functions , indexers etc which either accept or return variable of type “Object”.And thus HashTable just like ArrayList also suffers from the overhead of boxing/unboxing when used to store/retrieve value type variables.
The major problem with collection classes in .Net Framework 1.0/1.1 where : type safety and performance.Type safety because the collection classes expected “Object” type and thus any type can be passed as parameter and the code will compile but will throw a run time exception or if not exception then some weird result. Performance issue was because of the boxing and unboxing operation which compiler has to perform for value types.And the solution for these issues was provided in next version of .Net Framework 2.0 with the introduction of Generics.From then on the place held by ArrayList and HashTable was taken over by List<T> and Dictionary<TKey,TValue> respectively which we are going to see in next section.
C# 2 : Introduction of Generic Collection Classes ( List<T> and Dictionary<TKey,TValue>)
These generic collection classes , interfaces which they implemented and other helper classes where collectively put under entirely new namespace :
System.Collections.Generic;
Getting into the details of benefit of using generic collection classes is beyond the scope of this article.But very quickly I will list down the key benefits of generic collection classes :
- Type Safety
- Cleaner Code
- Better Performance
List<T>
Interface implemented by List<T>
public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
Screenshot for the above declaration :
In the List<T> declaration the “T” stands for “Type Parameter”.Before using the List<T> collection class , we need to specify the type parameter which can be an existing type(string , Int32 etc ) or our very own custom type(Person , Student etc ).But once the type is specified , then compiler will ensure that you are not adding any item which is not of the declared type.For e.g. if you declare List<Int32> , then we cannot add item of type “string” to collection.There by compiler ensures that there is no runtime exception and code is verified during compilation itself.This is what is known as “Type Safety” and by far is the biggest benefit added by generic collection classes to .Net Framework.
When working with value types , there is no overhead of boxing and unboxing value types while adding or retrieving them from collection.The compiler emits the proper IL code which highlights as if the collection is intended for working with value types.Consider the following example :-
class Program
{
static void Main(string[] args)
{
List<;Int32> list = new List<int>();
list.Add(23);
list.Add(34);
list.Add(25);
foreach (int num in list)
Console.WriteLine(num);
Console.ReadLine();
}
}
Following is the IL code generated by compiler.Again , I have removed most of the parts of the IL code for brevity purpose.
.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 95 (0x5f)
.maxstack 2
.locals init ([0] class [mscorlib]System.Collections.Generic.List`1<int32> list,
[1] int32 num,
[2] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32> CS$5$0000,
[3] bool CS$4$0001)
IL_0000: nop
IL_0001: newobj instance void class [mscorlib]System.Collections.Generic.List`1<int32>::.ctor()
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: ldc.i4.s 23
IL_000a: callvirt instance void class [mscorlib]System.Collections.Generic.List`1<int32>::Add(!0)
.................................................
..................................
.........................
} // end of method Program::Main
As you can see , list is in fact of type “Int32”.If the amount of data to be added/fetched from collection is high then using generic collection classes can be easily 10-20 times much faster than regular non generic collection classes.
Speaking about IL code , you can view the Il code for any compiler code by using the tool called ILDASM.In order to open ILDASM , follow the given path:start –> Visual Studio 2005/2008/2010 –> Visual Studio Tools –> Visual Studio Command Prompt . Once the command prompt opens up , type “ildasm” and the corresponding tool will open up.In order to view the IL code for any dll/exe , just drag and drop it over the ILDASM window.
Lets very quickly have a look at the internals of the IList<T> interface implemented by the List<T> class.
IList<T> Interface
public interface IList<T> : ICollection<T>, IEnumerable<T>, IEnumerable
{
// Methods
int IndexOf(T item);
void Insert(int index, T item);
void RemoveAt(int index);
// Properties
T this[int index] { get; set; }
}
If you have noticed List<T> or IList<T> interface , then they implement both the generic and non generic versions of certain interfaces.For eg IList<T> inherit both IEnumerable<T> and IEnumerable interface.The reason for this is backward compatibility.And the consequence of this is that if you are creating a custom type that inherit from say IList<T> interface then you need to provide implementation for both the generic and non generic version of the IEnumerable interface.
But don’t worry , with generic collection classes in place the need to defining your own custom collection classes is very rare.But still if by any chance you need more options then try the Wintellects PowerCollection Library.
Dictionary<TKey , TValue>
There is not much left to say about Dictionary<TKey,TValue> collection class.Again here TKey and TValue are the type parameter for the key and value pair for Dictionary.And once the type for both key and value is specified , compiler will ensure that you are not adding either key or value whose type is mismatched with the previously defined types.Dictionary<TKey,TValue> share the same namespace with List<T> class.
Interface Implementation by Dictionary<TKey,TValue>
public class Dictionary<TKey, TValue> : IDictionary<TKey, TValue>, ICollection<KeyValuePair<TKey, TValue>>, IEnumerable<KeyValuePair<TKey, TValue>>, IDictionary, ICollection, IEnumerable, ISerializable, IDeserializationCallback
Screenshot for the above declaration
Summary
Through out the article rather than concentrating on how to use ArrayList , List<T> , HashTable etc but I was more concerned on what are the interfaces implemented by these collection classes.Their usage is very well documented and explained in numerous articles and blog but highlighting what interfaces they implement provide following two important benefits.
- Understanding of collection classes in .Net framework and over all class hierarchy is enhanced.So next time if you are reading some standard text , and if it uses IList<T> , through out the context then you are sure that what ever is said is applicable to List<T> class and to any class that implements IList<T> interface.In fact books which I refer for .Net programming , their authors always speak in terms of interfaces and abstract classes.The fundamental principle behind this is “programming to supertype” or “programming to interface”.You can read more on this in the following article “Program to an interface, not an implementation”.
- Method’s Parameter Type : When defining methods parameter type , its always better to specify the weakest type i.e. prefer interface over base classes.For example if you have a method that manipulates a collection then its better to specify the parameter type as IEnumerable<T> , rather than specifying concrete class like List<T>.Consider the following code:
// Desirable
private void Print(IEnumerable<string> items);
//Undesirable
private void Print(List<string> items);
Reason first method can be called by passing object that implements the IEnumerable<T> interface like Dictionary<TKey,TValue>,HashSet<T> , LinkedList<T> etc.Anyways if you are writing a method which expects a list then you should declare the parameter type as IList<T> rather than specifying List<T>.