Fun with Folds

Saturday, February 14, 2009

C# F# Functional Programming Haskell User Groups

As I’ve announced earlier, and if you follow me on Twitter, I’ve been doing a bit of Haskell lately through the Real World Haskell book club that I started. Most recently, through our learnings, we have been covering the basic fundamentals of functional programming. The most fundamental piece to this is to understand how to apply the basic three higher-order functions, map, filter and fold. This time, we’re covering the fold function in a neat challenge to convert functions which use explicit recursion to use folds instead. So, what are folds anyways?

Understanding Folds

What is a fold anyhow? Simply put, a fold is a higher-order function that knows how to reduce a given data structure (typically a sequence of elements) into a single return value. We could describe it as doing something to each value in the list, updating an accumulator as we go, and returning the value of the accumulator when we’re finished. There are two sorts of folds, a left fold and a right fold. The difference is the way the data is "folded", as we will discuss here shortly.

The Left Fold

The left fold is used for folding from the left, otherwise known as the start of the list. The basic definition of the left fold is the following in Haskell:

-- Left Fold     
foldl :: (a -> b -> a) -> a -> [b] –> a

The left fold function takes a step function, an initial value for the accumulator, and the list itself. The function calls the step function on the current accumulator and current element in the list, and then passes the new accumulator value to itself recursively to build up the accumulator until the list is exhausted.

To understand how this works, let’s walk through a simple example of the trite factorial, walking through each step of the calculation. In this example, we’ll walk through each step of calculating the factorial of 3 in F#.

List.fold_left (*) 1 [1..3]  
  = List.fold_left (*) (1 * 1)             (2::3::[])  
  = List.fold_left (*) ((1 * 1) * 2)       (3::[])  
  = List.fold_left (*) (((1 * 1) * 2) * 3) ([])  
  =                    (((1 * 1) * 2) * 3)     
  = 6

Now that we have a basic handle on how this is done, let’s look at the other fold function and what it can do for us.

The Right Fold

The other type of fold is the right fold, which folds from the right, or end of a given list. The basic definition of a right fold is the following in Haskell:

-- Right fold     
foldr :: (a -> b -> b) -> b -> [a] –> b

The signature looks very similar to the left fold, so in order to see the difference between the two, let’s walk through a right fold of our factorial once again.

List.fold_right (*) [1..3] 1      
  = 1 *           List.fold_right (*) (2::3::[]) 1     
  = 1 * (2 *      List.fold_right (*) (3::[]) 1)      
  = 1 * (2 * (3 * List.fold_right (*) ([]) 1))     
  = 1 * (2 * (3 * 1))     
  = 6

The difference between the two should be a little bit more apparent now. With a left fold, the empty list element is on the left, and the parentheses group to the left. With our right fold, the 1 value is on the right, and the parentheses group to the right.

But, how useful is it? Well, we could write any number of functions such as filter and map in terms of a right fold such as the following:

let filter' p xs =  
  let step x ys = if p x then x::ys else ys     
  List.fold_right step xs []     
    
let map' f xs =  
  let step x ys = f x :: ys     
  List.fold_right step xs []

And that brings us to our challenge today, of rewriting other functions using folds instead of explicit recursion. But, first, let’s look at one fold function you may already be familiar with.

The LINQ Aggregate

Sure enough, the LINQ Aggregate function is an implementation of a left fold. Let’s look at the signature of one of the Aggregate overloads to get an idea of how it maps to our above functions:

public static TAccumulate Aggregate<TSource, TAccumulate>(     
  this IEnumerable<TSource> source,  
  TAccumulate seed,  
  Func<TAccumulate, TSource, TAccumulate> func); 

As you can see, we have our source, the seed and a function which passes the the accumulator and the current element to calculate the new accumulator value. We could now use LINQ to implement our factorial function once again such as the following in C#:

// Left fold     
Enumerable.Range(1, 3).Aggregate(1, (a, n) => a * n));  
    
// Right fold by way of a left fold     
Enumerable.Range(1, 3).Aggregate<int,Func<int,int>>     
  (x => x, (f, n) => x => f(x * n))(1)); 

LINQ was built on the standard three higher-order functions of map (Select), filter (Where) and Fold (Aggregate). Understanding the fold is probably the more difficult to understand of them, but one of the most powerful. Bart De Smet covers more of the subject in detail in his post Folding left, right and the LINQ aggregation operator.

But, now it’s time we stop postponing and get to the challenge!

The Challenge

Question 10 in Chapter 4 of the Real World Haskell book gives a particularly interesting challenge. How many of the following standard Prelude functions can be rewritten as folds:

any
cycle
words
unlines

Let’s take each one of these and determine whether you can take the standard definition and rewrite them using folds. The first function is the any function which is defined here:

-- Applies a predicate to each item in the list to see if  
-- any satisfies     
any :: (a -> Bool) -> [a] -> Bool     
any _ []     = False     
any p (x:xs) = p x || any p xs

Or as written in F#, might look something like this:

let rec any (p:'a -> bool) : 'a list -> bool = function     
  | []    -> false     
  | x::xs -> p x || any p xs;;

The logic for this function is rather straight forward:

If the incoming list is empty, then return false
Else, apply a logical-or to the predicate of the head and the recursive call to the any function to determine the result

The behavior of the function should operate as the following:

ghci> any even [1,3,5,7,8,9]    
True

The question is, can we rewrite this function as a fold, and if so how? The answer is actually quite simple. By applying a right fold, we are able to step through each value and pass a Boolean accumulator which allows us to determine if any matched our predicate function. We won’t bother applying the function to our item in the list if we should already be true. Knowing this, could now rewrite the any such as the following:

any' :: (a -> Bool) -> [a] -> Bool     
any' p = foldr step False     
               where step x acc = acc || p x

Just as easily, we could write this in F# as well using the List.fold_right, such as this:

let any' (p:'a -> bool) (xs:'a list) : bool =     
  let step x acc = acc || p x     
  List.fold_right step xs false

We can now verify the behavior of our any’ function such as the following:

> let even x = x % 2 = 0     
  any' even [1;3;5;7;8;9];;     
val it : bool = true

Our next challenge is the cycle function. This function is defined as the following:

-- Takes a finite list and turns it into an infinite circular one     
cycle :: [a] -> [a]     
cycle [] = error "Prelude.cycle: empty list"     
cycle xs = xs' where xs' = xs ++ xs'

The behavior of the function should operate as the following:

ghci> let c = take 10 . cycle    
ghci> c [1..3]     
[1,2,3,1,2,3,1,2,3,1]     

Since Haskell is a lazily evaluated language, such constructs as these work nicely to create an infinite list. And since F# is not, let’s skip this example and focus on Haskell only. Trying to replicate the code here will only cause nice StackOverflowExceptions to occur and we don’t want that. So, how would we solve this problem using a fold? Consider the following logic:

Apply a right fold with a concat operator to bind collections
Repeat input using the repeat function instead of recursion

Our solution might look something like this:

cycle' :: [a] -> [a]      
cycle' xs = foldr (++) [] (repeat xs)

We can verify the behavior of our code now such as the following:

ghci> let c = take 10 . cycle'    
ghci> c [1..3]     
[1,2,3,1,2,3,1,2,3,1]

The next challenger is the words function. This function is defined as the following:

-- Breaks a string into a list based upon whitespace     
words :: String -> [String]     
words s = case dropWhile isSpace s of     
            "" -> []     
            s' -> w : words s''     
                  where (w, s'') =     
                         break isSpace s'

The behavior of the function should operate as the following:

ghci> words "foo bar"    
["foo","bar"]

For those who are coming from the F# world, you may realize that Haskell treats strings as character lists, as opposed to an IEnumerable<char>. Looking at the above code sample, how might we be able to use a fold instead of recursion here? Consider the following logic:

Accumulate a list of strings as we build our result
If we encounter a space, we drop it, and then cons an empty to the accumulator
Else, we cons the character to our list.

Let’s see how that looks in code:

words' :: String -> [String]     
words' [] = [[]]     
words' s = foldr step [[]] s     
              where step c acc@(a:as')     
                      | isSpace c = [] : acc     
                      | otherwise = (c:a):as'

Of course, we could write this using F# as well, although it’s not quite as elegant, due to the fact that strings are not considered character lists, but instead sequences. Let’s go over briefly what the code might look like:

module Char =     
  let is_space = System.Char.IsWhiteSpace     
    
module String =     
  let of_list (xs:char list) : string =     
    new string(Array.of_list xs)     
    
let words' : string -> string list =     
  let words'' : char list -> char list list = function     
    | [] -> [[]]     
    | s  -> let step c acc =     
              let (a::as') = acc     
              if Char.is_space c then [] :: acc     
              else (c::a)::as'     
            List.fold_right step s [[]]  
  Seq.to_list >> words'' >> List.map String.of_list

We can verify the behavior of the function in F# such as the following:

> words' "foo bar";;     
val it : string list = ["foo"; "bar"]

The final challenge is the unlines function. It is defined as the following:

-- Joins strings together with a line terminator in between     
unlines :: [String] -> String     
unlines [] = []     
unlines (l:ls) = l ++ '\n' : unlines ls

What this function does is to take each line and concat the a cons of an ‘\n’ character plus the recursive call to unlines. Knowing what we know from the above examples, this should also be easy. How might we do it? Let’s take the following logic:

Use a fold right to enumerate the lines
Create a step function which first concats the \r\n (Windows style) and then concats the rest of the list
Use an empty list accumulator

unlines' :: [String] -> String     
unlines' = foldr step []     
             where step = ((++) . (++ "\r\n")) 

Using some of these techniques, we could also apply this to F# as well. Taking the List module from above, we could write the code as the following:

let flip f x y = f y x     
let unlines' (xs:string list) : string =     
  let step = ((@) << (flip (@) (Seq.to_list "\r\n")))     
  List.fold_right step (List.map Seq.to_list xs) [] |> String.of_list

We can now evaluate the results of our function such as the following which should follow our expected behavior.

> unlines' ["foo";"bar"];;     
val it : string = "foo\r\nbar\r\n"

So the answer to the question is, yes, all of the above functions can be expressed in terms of a fold function. And I think we learned a bit along the way as well.

Conclusion

In functional programming, the use of folds are extremely common. With a little practice, as I’ve shown above, we will have an easier time understanding how to use folds instead of explicit recursion. Indeed, many of the problems for which we use explicit recursion to solve could easily be solved with folds.

Why use folds over explicit recursion? Because using folds gives us well defined and explicit behavior, whereas explicit recursion may take many forms and harder to get just right. By applying the basic three higher-order functions, we only need to learn this behavior once and then anyone who looks at our code will quickly understand our intent. Because of this, you’ll be able to write much more concise code.

Understanding Folds

The Left Fold

The Right Fold

The LINQ Aggregate

The Challenge

Conclusion

1 Comment