Functional .NET - Laziness Becomes You

Wednesday, March 18, 2009

In the previous post, I talked about some of the basic ideas you can learn from Functional Programming and apply to your code right now. The first topic that was tackled was extensibility through the use of closures. Today, I’ll cover laziness in your APIs, how it can help, and what pitfalls might arise.

Laziness Explained

One of the hallmark features of the Haskell programming language is lazy evaluation. This gives us the ability to delay the computation of a given function, structure, etc, until it is absolutely needed. Sounds very agile in a way of the last responsible moment, actually. Why is it useful? By delaying the computation, we can gain performance increases by avoiding the calculations ahead of time. With this, we can create infinite data structures, as well as control structures such as if/then/else statements. Let’s take two Haskell examples of infinite structures, the first being an infinite list of 1 to whatever, and the next of all, yes, I know, the trite Fibonacci sequence. Opening up the GHCi, we can type along:

Prelude> let l = [1..]  -- Infinite list from 1 to beyond 

Prelude> take 3 l 

[1,2,3] 

Prelude> let fibs = 0 : 1 : zipWith (+) fibs (tail fibs) 

Prelude> :t fibs -- All Fibonacci numbers 

fibs :: [Integer] 

Prelude> take 5 fibs 

[0,1,1,2,3]

I find when explaining some of these topics, that Haskell as a language works best. The Fibonacci sequence is calculated with a cons of 0 and then a cons of 1 to the zipWith of addition of the Fibonacci sequence and the tail of the Fibonacci sequence. This takes the initial cons and then takes the previous (the tail) and the current and adds them together in an even/odd fashion. But the important thing to note is that even though the list goes on forever, evaluating this didn’t cause my system to crash.

Just as well, we could rewrite the following in F# pretty much the same as above with a few changes. The sequence comprehensions do not understand infinite sequences, so we can use other means as well. And since F# is an eagerly evaluated language, we’ll have to use a thunk to express the laziness. We’ll get into what that is a little later below:

// Infinite sequence 

> let l = Seq.init_infinite id;; 

val l : seq<int> 

> Seq.take 3 l;; 

val it : seq<int> = seq [0; 1; 2] 

// Fibonacci with map2 

> #r "FSharp.PowerPack.dll";; 

> let rec fibs : LazyList<int> = 

-   LazyList.consf 0  
-     (fun () -> (LazyList.cons 1  
-     (LazyList.map2 (+) fibs (LazyList.tl fibs))));; 

val fibs : LazyList<int> 

> Seq.take 3 fibs;; 

val it : seq<int> = seq [0; 1; 1]

But another use could be the control statements of if/then/else as a simple example. Imagine if you will that you were implementing this as a function instead of as a language feature. The last thing you’d want to do is eagerly evaluate the else statement before its needed, else it could cause some unwanted side effects or even possibly crash.

There are some really good articles on the Haskell Wiki on how laziness works and the performance metrics that you can check out as well. As much as I love talking about Haskell, let’s move onto .NET and talk a little about where it can fit there as well.

How Lazy Is .NET Anyways?

So, the question becomes, just how lazy can .NET be? Well, there are several pieces to cover, including iterators, delayed evaluation through functions and lazy values, of which the latter will be covered in a later post.

Iterators

With the advent of iterators in C# 2.0 and the ability to yield values on an as needed basis through the use of continuations. For example, we could rewrite the fibs sequence using a C# 2.0 iterator such as this:

static IEnumerable<ulong> Fibonacci
{
    get
    {
        ulong i = 1;
        ulong j = 1;
        while (true)
        {
            yield return i;
            var temp = i;
            i = j;
            j = j + temp;
        }
    }
}

With this, we’re able to calculate every number as we want them and nothing beforehand. If you step through the code, you’ll see how it operates. But, moving on, we can also use this to express API calls that are right now, thoroughly eager. Take for example, the Directory.GetFiles method which takes a path and some options, and returns a string array of the file names. The problem with that is that you could be traversing a directory of thousands of images or documents, and is there a need at that point to evaluate all of them, when you may only want the top 5?

In a previous post, “Functional .NET – Fighting Friction in the BCL”, I did exactly that to implement the Directory.GetFiles in a lazy evaluated manner. The underlying Kernel32 implementations in Windows use an iterator pattern of FindFIrstFile and FindNextFile of which we can easily take advantage. Let’s once again look at the example:

public static IEnumerable<string> GetFiles(
    string directory,
    SearchOption option)
{
    var findData = new NativeMethods.WIN32_FIND_DATA();
    using (var findHandle = NativeMethods.FindFirstFile(
        directory + @"\*", findData))
    {
        if (!findHandle.IsInvalid)
        {
            do
            {
                if ((findData.dwFileAttributes &
                    FileAttributes.Directory) != 0)
                {
                    if (findData.cFileName != "." &&
                        findData.cFileName != "..")
                    {
                        if (option == SearchOption.AllDirectories)
                        {
                            var subdirectory = Path.Combine(
                                directory,
                                findData.cFileName);
                            foreach (var file in GetFiles(subdirectory, option))
                                yield return file;
                        }
                    }
                }
                else
                {
                    var filePath = Path.Combine(directory, findData.cFileName);
                    yield return filePath;
                }
            } while (NativeMethods.FindNextFile(findHandle, findData));
        }
    }
}

Now, through the power of LINQ, we are able to pick and choose what type of file we want and how many we want at the same time. Such examples might be:

var fiveFiles = GetFiles(@"C:\Work").Take(5);

var results = from file in GetFiles(@"C:\Tools", SearchOption.AllDirectories)
              where (File.GetAttributes(file) & FileAttributes.Hidden) != 0
              select new FileInfo(file);

Well, that’s all very good and interesting. But what about another idea of delaying evaluation with functions?

Thunking About It

Another way of delaying the execution of a given statement is by wrapping it in a function. This is called a thunk, which is a function which takes no arguments and delays the computation of the function return value. The function returned forces the thunk then to obtain the actual value. The reason for doing so is once again as above to not compute until absolutely needed. Such scenarios for this could be the initialization of a container or some other potentially expensive operation.

static Func<double> Average(IEnumerable<int> items)
{
    return () => items.Average();
}

This in itself is interesting in that we get a function back at which point we are free to evaluate it when it’s absolutely needed. We can also write a version of the

Is anybody using this in practice though? The answer is yes!

The Common Service Locator library on CodePlex has a very good example of using this in practice. For example, when we want to register the StructureMap container, here is the following code used:

var container = new Container(x =>
    x.ForRequestedType<ICustomerRepository>()
    .TheDefaultIsConcreteType<CustomerRepository>());

ServiceLocator.SetLocatorProvider(
    () => new StructureMapServiceLocator(container));

If we look at the actual implementation in the source code, we find we set our thunk in the SetLocatorProvider and then only force the eval once you call the Current property as follows with the comments being my own:

public static class ServiceLocator
{
    // The thunk
    // Should be just Func<IServiceLocator> instead
    private static ServiceLocatorProvider currentProvider;

    // Force the eval
    public static IServiceLocator Current
    {
        get { return currentProvider(); }
    }

    public static void SetLocatorProvider(ServiceLocatorProvider newProvider)
    {
        currentProvider = newProvider;
    }
}

Pretty straight forward, and yet a powerful concept.

Side Effects and Laziness

Laziness in your APIs can be a good thing. But with this good thing, there are downfalls that some beginners can make. I noted back in September of last year some of these pitfalls in my post “Side Effects and Functional Programming”. In this post, I covered a couple of cases including:

Order of Side Effects with Laziness
Exception Management
Resource Management
Closure scope

These are important to keep in mind as we’re mixing paradigms because laziness and side effects don’t mix. Alas, many do say that Haskell has been harder for them to pick up in part due to the lazy evaluation.

Conclusion

With just these two examples of iterators and thunks, we have some ideas on how we can apply laziness to our codebase, with some careful consideration of side effects of course. There are yet many other cases to cover in this area, but I think for now this is a good starting point. Other areas yet to explore with how to improve your code with functional constructs yet to come is around the use of expressions and functions to rid yourselves of the evil “magic strings”.