Recursing into List Processing

Wednesday, July 16, 2008

Lately I've been getting back to basics with regards to recursion. This is a basic and essential skill, especially in the functional programming world. Today's dive will be into immutable lists and recursion. I'll do my best to provide the F# and C# equivalent to each call. This is in part of the back to basics and from here I'll move onto other subjects.

Let's catch up to where we are today:

Recursive List Processing

In the functional programming world, and especially in the functional programming world are singly linked immutable lists. Each of the items in the list is a cell with a reference to the next item in the list. For functional programming, unlike imperative programming, they are optimized for recursion. By that I mean that they have a head (the first item) and the tail (the rest of the list). F# has the use of this through the List<'a> class.

Let's walk through a simple example of how you might sum up all items in a list using recursion as well as one for calcuating the length of the list. In F# this is pretty simple, using pattern matching.

F#

#light

module ListExtensions =
  let rec sum = function
  | [] -> 0
  | h::t -> h + sum t

  let rec length = function
    | [] -> 0
    | h::t -> 1 + length t
 
[1..20] |> ListExtensions.sum |> print_any
[1..10] |> ListExtensions.length |> print_any

This function above had a simple pattern matching over the list which said for an empty list, the result would be zero, else it would be the sum of the head and the calculation of the rest of the list and eventually winds its way down to the end of the list. Likewise inthe length, it adds 1 to the calculation of the length function until it winds down to nothing left in the list. In the base .NET libraries, we don't have a list like this, but it's actually not that hard to create one. Let's walk through a simple immutable linked list much like the List<'a> in F#.

Let's define in C# what something like that might look like. I'll start with something the way Wes Dyer had it but with a few changes:

public interface IImmutableList<T> : IEnumerable<T>
{
    T Head { get; }
    IImmutableList<T> Tail { get; }
    bool IsEmpty { get; }
    bool IsCons { get; }
}

As you can see, we have a head, a tail and a way to determine if this list is empty or if it is a cons. All very important pieces to the puzzle. This also inherits the IEnumerable<T> interface as well so that we can iterate this should we need to. Now, let's go into the implementation details of the immutable list.

public class ImmutableList<T> : IImmutableList<T>
{
    public class EmptyList : ImmutableList<T> { }

    public class ConsList : ImmutableList<T>
    {
        internal ConsList(T head, IEnumerator<T> enumerator)
        {
            Head = head;
            this.enumerator = enumerator;
        }
    }

    private static readonly IImmutableList<T> empty = new EmptyList();

    IImmutableList<T> tail;
    IEnumerator<T> enumerator;

    public T Head { get; private set; }

    public static IImmutableList<T> Cons(T head, IImmutableList<T> tail)
    {
        return new ConsList(head, tail.GetEnumerator());
    }

    public static IImmutableList<T> Cons(IEnumerator<T> enumerator)
    {
        return enumerator.MoveNext() ? new ConsList(enumerator.Current, enumerator) : empty;
    }

    public IImmutableList<T> Tail
    {
        get
        {
            if (enumerator != null)
            {
                tail = Cons(enumerator);
                enumerator = null;
            }
            return tail;
        }
    }

    public bool IsCons { get { return this is ConsList; } }

    public bool IsEmpty { get { return this is EmptyList; } }

    public static IImmutableList<T> Empty { get { return empty; } }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return ((IEnumerable<T>) this).GetEnumerator();
    }

    public IEnumerator<T> GetEnumerator()
    {
        for (IImmutableList<T> current = this; !current.IsEmpty; current = current.Tail)
            yield return current.Head;
    }
}

Now what I have done here is implemented two internal lists to represent a "cons" and one to represent an empty list. This helps when determining whether the list is empty or not without having to check references and whether something is null. Inside my ConsList<T>, I'm able to create a new instance of the ImmutableList<T> with the head value and the rest of the list.

The head of the list is just simply that, the first item. When the tail property is called, I create a new IImmutableList<T> with my enumerator by calling the Cons method. I also have properties which define and empty list, whether it is empty or whether it is a "cons". Like I said, there's nothing hard about this.

Then I'm able to define a Sum and Length methods much as above with something like this:

public static int Sum(this IImmutableList<int> list)
{
    if (list.IsEmpty)
        return 0;

    return list.Head + Sum(list.Tail);
}

public static int Length<T>(this IImmutableList<T> list)
{
    if (list.IsEmpty)
        return 0;

    return 1 + Length(list.Tail);
}

And then invoking this is pretty straight forward through the main method:

static void Main(string[] args)
{
    var enumerator = Enumerable.Range(1, 10).GetEnumerator();
    var list = ImmutableList<int>.Cons(enumerator);
    var sum = list.Sum();
    var length = list.Length();
    Console.WriteLine("List sum : {0}", sum);
    Console.WriteLine("List length :{0}", length);
}

But what about tail recursion when doing list processing? Let's walk through one more example of tail recursion with lists. In this example, I'm going to go ahead and return the last item in the list. It's a pretty simple and straight forward function which determines whether the list only has a head, or if it also has a tail, then recurse on the function again.

F#

#light

module ListExtensions =
let rec last = function
    | [] -> invalid_arg "last"
    | [h] -> h
    | h::t -> last t

[1;5;3;2;6;] |> ListExtensions.last |> print_any

Now the C# version should look just as familiar to you. Basically instead of the pattern matching, which I'd love to have in C#, I'm using if statements to determine whether I'm returning the head, or the evaluation of the Last function again until I wind down to the head.

C#

public static T Last<T>(this ImmutableList<T> items)
{
    if(items.IsEmpty)
        throw new ArgumentNullException("items");

    var tail = items.Tail;
    return tail.IsEmpty ? items.Head : tail.Last();
}

var e1 = new List<int> {1,5,3,2,6}.GetEnumerator();
var items = ImmutableList<int>.Cons(e1);
var last = items.Last();
Console.WriteLine("Last item: {0}", last);

In the past I covered why tail recursion is important in my previous post. Unfortunately, the C# code won't do much for you uniless the compiler is optimized for tail calls, or you are using the x64 version of Windows. I have some of these samples in my Functional C# samples on MSDN Code Gallery.

Wrapping it Up

As you can see, recursion is a pretty interesting topic. One of the best ways to avoid mutable state in your programs is through recursion. I've shown some simple examples of where recursion can help to solve some of those issues. This will wrap up a lot of the discussions about recursion and then I'm moving into such topics as memoization, continuations and so on. The code samples will be available on the Functional C# Samples during this back to basics trip.

No Comments