Recursing into Recursion - Memoization

Lately, I've been heads down on a lot of concurrency items which will hopefully come out soon.  In the mean time, I want to get back to the basics one last time with recursion.  As I posted earlier, I've been talking about recursion in the past couple of posts, and this one will be one last on the topic.  I'll do my best to post the samples in both C# and F#.  As always, you can find my C# samples on MSDN Code Gallery at the Functional C# Samples.

Let's catch up to where we are today:

Memoization

When doing heavy computations through recursion, memoization becomes a pretty important topic.  The idea behind memoization is to speed up your programs by avoiding repetitive calculations for previously processed function inputs.  Let's go ahead and try a simple example using the ever so trite Fibonacci sequence as a quick example.

F#
let rec fibonacci n =
  if n <= 2 then 1 
  else fibonacci(n-1) + fibonacci(n-2)

C#
static int Fibonacci(int n)
{
    return n <= 2 ? 1 : Fibonacci(n - 2) + Fibonacci(n - 1);
}

But, if you call these above functions time and time again with nearly the same input, they have to go through the same computation again and again, which isn't necessary.  So, instead, let's try through the technique of memoization to speed up this computation.

F#
let fibonacci = 
  let t = new Dictionary<_,_>()
  let rec fibCached n = 
    if t.ContainsKey(n) then t.[n]
    elif n <= 2 then 1
    else
      let res = fibCached(n - 2) + fibCached(n - 1)
      t.Add(n, res)
      res
  fun n -> fibCached n

C#
public static Func<int, int> Fibonacci()
{
    var t = new Dictionary<int, int>();
    Func<int, int> fibCached = null;
    fibCached = n =>
    {
        if (t.ContainsKey(n)) return t[n];
        if (n <= 2) return 1;
        var result = fibCached(n - 2) + fibCached(n - 1);
        t.Add(n, result);
        return result;
    };

    return n => fibCached(n);
}

As you can see from the above code, we're using a Dictionary<TKey, TValue> as our storage for already computed values.  Then we go ahead and compute the values if we have not seen them before.  We'll see a pretty impressive speed boost when calling with repetitive values.  But, that's not a generic solution, and I'd rather have the ability to take a generic function and pass any function to this and compute.  So, let's go ahead and make this generic.

F#
let memoize (f : 'a -> 'b) =
  let t = new System.Collections.Generic.Dictionary<'a, 'b>()
  fun n ->
    if t.ContainsKey(n) then t.[n]
    else
      let res = f n
      t.Add(n, res)
      res
  
let rec fibonacci= 
  memoize(fun n -> if n <= 2 then 1 else fibonacci(n - 1) + fibonacci(n - 2))

C#
public static Func<T, TResult> Memoize<T, TResult>(this Func<T, TResult> func)
{
    var t = new Dictionary<T, TResult>();
    return n =>
    {
        if (t.ContainsKey(n)) return t[n];
        
        var result = func(n);
        t.Add(n, result);
        return result;
    };
}

static int Fibonacci(int n)
{
    return n <= 2 ? 1 : Fibonacci(n - 2) + Fibonacci(n - 1);
}

Func<int, int> fibonacci = Fibonacci; 
var fibonacciMemoized = fibonacci.Memoize();
var fibFast1 = fibonacciMemoized(30);

As you can see, I was able to create a generic memoize function that takes a function as its input, checks whether the item is in the cache and if not, adds it to the cache.  Using the C# version, we created an extension method for our given Func<TArg, TResult>.  This allows us to memoize the function correctly.  Alternatively, I could have written the memoized function this way:

C#
static Func<int, int> Fibonacci()
{
    return Extensions.Memoize((int n) => n <= 2 ? 1 : Fibonacci(n - 2) + Fibonacci(n - 1));
}

var fibonacci = Fibonacci;
var fibFast1 = fibonacci(30);

To me, this reads a bit more of functional programming and the true power of it.  I basically took the above methods and combined them so that my Fibonacci function now returns the function instead taking an integer and returning an integer.

Beware of the Bad Memoization

In order to get memoization to work properly, you must remember not to include the value to the function when declaring the function.  This will cause a fresh memoization table to be created each and every time.  Such functions such as these should be avoided as below:

F#
let rec fibonacciIncorrect n =
  memoize (fun n -> if n <= 2 then 1 else fibonacciIncorrect(n - 2) + fibonacciIncorrect(n - 1)) n

C#
static int FibonacciIncorrect(int n)
{
    return Extensions.Memoize((int x) => x <= 2 ? 1 : FibonacciIncorrect(n - 2) + FibonacciIncorrect(n - 1))(n);
}

Generic Memoization Service

Due to the subtlety of problems mentioned above, it may be better to return an object instead of a function.  These can be pretty simple objects with no more functionality than to return values and discard the given collection.  Let's go ahead and do this sample in F# first.  What I'm going to do is define the holder type for the calculated value:

type ITable<'a, 'b> =
  inherit System.IDisposable
  abstract Item : 'a -> 'b with get

This defines for me an interface with a Discard method and an Item method which accepts 'a and returns 'b via a get accessor.  Pretty simple holder for my data.  Now, let's define the memoize function which does the work:

let memoize f =
  let outerTable = new Dictionary<_,_>()
  { new ITable<_,_> with
      member t.Item
        with get(n)
          if outerTable.ContainsKey(n) then outerTable.[n]
          else
            let res = f n
            outerTable.Add(n, res)
            res

      member t.Dispose() =
        outerTable.Clear()
  }

What you can see from the above is that I'm creating an inner class of type ITable which I defined above.  Then I implement the given methods and properties I defined on the interface for Item and Discard().  During the Item property, I'm checking whether I have the given item.  If I do, I return it from the collection, else I execute the given function and store the result.  During my discard, I give myself the ability to reset the cached values.  Now, let's implement the Fibonacci sequence function once again with our new function that we defined above.

let rec fibonacci =
  memoize 
    (fun n ->
      if n <= 2 then 1 else fibonacci.[n - 1] + fibonacci.[n - 2])

Now with this function, I can execute the given methods directly on the class that was returned from the memoize function.  As you can see, I am getting values from the fibonacci object and calling the function once again on itself.  I can then calculate the given numbers while using this code below:

let result = fibonacci.[10]
printfn "%i" result

That's pretty slick, isn't it?  Now, let's try in C#.  It's not going to be as elegant nor to the point.  But, let's define our Table class first.

public class Table<T, U> : IDisposable
{
    private readonly Dictionary<T, U> dictionary;
    private readonly Func<T, U> func;

    public Table(Dictionary<T, U> dictionary, Func<T, U> func)
    {
        this.dictionary = dictionary;
        this.func = func;
    }

    public U this[T n]
    {
        get
        {
            if (dictionary.ContainsKey(n))
                return dictionary[n];

            var result = func(n);
            dictionary.Add(n, result);
            return result;
        }
    } 

    public void Dispose()
    {
        dictionary.Clear();
    }
}

Now that this is defined, then the rest of our methods should come nicely as noted here.

static Table<T, U> MemoizeTable<T, U>(Func<T, U> func)
{
    return new Table<T, U>(new Dictionary<T, U>(), func);
}

public static Table<int, int> Fibonacci
{
    get
    {
        return Memoize((int n) => n <= 2 ? 1 : Fibonacci[n - 2] + Fibonacci[n - 1]);
    }
}
var fibonacci = Fibonacci;
var fibFast1 = fibonacci[30];

Not as elegant in the F# version because C# won't let me create an anonymous inner class as F# does.  If you look around the F# codebase, you'll find them using this plenty to define such things that return IEnumerable<T> for Map and so on.  Not the prettiest solution in C#, as you can well imagine.

Some Notes on Memoization

Don't forget that proper memoization requires the functions that you are writing to be pure.  What that means is that given the same input, the same output will always be calculated and returned.  Also note that none of the code I wrote was particularly thread safe, so I would definitely recommend if you were going to worry about concurrency here with our shared mutable state, that you would use the appropriate locks and so on.

Wrapping It Up

Now I was able to show you how to speed up some of those recursive algorithms using memoization.  This allows for a bit better efficiency when doing large and pure calculations.  Next time, I'm going to start going back to my concurrency topics as I have a lot yet untouched.  With my recent presentations on the matter, there is plenty to talk about.

kick it on DotNetKicks.com /script>/script>/script>/script>/script>/script>/script>
Published Friday, August 1, 2008 3:39 PM by podwysocki

Comments

# re: Recursing into Recursion - Memoization

Monday, February 15, 2010 3:15 AM by Mr Panda.

Your C# examples don't work - they recurse into the non-memoized version of Fibonacci and so don't cache the return values.

# re: Recursing into Recursion - Memoization

Friday, July 16, 2010 5:54 PM by Steven

They do work.

But they give this warning:

Warning 4 This and other recursive references to the object(s) being defined will be checked for initialization-soundness at runtime through the use of a delayed reference. This is because you are defining one or more recursive objects, rather than recursive functions. This warning may be suppressed by using '#nowarn "40"' or '--nowarn:40'.