The “Anti-For” Campaign

Friday, June 26, 2009

Recently, there has been an effort launched called the “Anti-If Campaign” in which they deride the use of if statements and instead, focus on Object Oriented Principles in order to create more flexible designs. Now certainly, I have a sympathetic ear to this cause as I’ve seen code that literally walks off the side of the screen due to nesting of if statements. Pattern matching to me, especially at the top level of the function is actually quite beautiful in a way, such as the implementations in Haskell:

-- Haskell
lucas :: Int -> Integer
lucas 0 = 2
lucas 1 = 1
lucas n = lucas (n - 2) + lucas (n - 1)

And in Erlang, this also holds true:

% Erlang
-module(lucascalc).
-export([lucas/1]).

lucas(0) -> 2;
lucas(1) -> 1;
lucas(N) -> lucas (N - 2) + lucas (N - 1).

Simple, easy to understand and best of all, no if statements. But, instead of focus on this debate, I’d like to propose another which strikes closer to this functional programmers heart, the “Anti-For Campaign”. This is simply to say that we should create and use composable functions instead of explicit for loops. This is actually an old post I had written months ago and until now had been unfinished, but now with some inspiration, it’ll finally be done.

What and Why?

Before you throw all sorts of questions asking what and why, let me instead ask a question. When you’re writing a loop, ask yourself the question, “What am I accomplishing in this loop?” Chances are, it might be one of the following:

Query (Map, Filter, etc)
Aggregation (Sum, Count, etc)
Perform some side effect

If you’re doing more than one of those in a single loop, then well, you’re probably doing too much. In fact, Martin Fowler’s Refactoring site has a refactoring called “Split Loop” which would solve that issue. It is better for future refactorings and readability if we keep those loops pointed to do one thing, and one thing only. Better yet, we could rid ourselves of that loop altogether, and that’s what we’ll focus on here.

Looking at first two bullet points, you’ll notice most of LINQ is indeed built around those two to be able to query data as well as aggregate. The final bullet point, we perform some sort of side effect, perhaps writing to a file, printing to the console, or even perhaps sending messages.

So, what’s my problem with them? My problem is that it focuses more on the How instead of the What. Let’s look at a quick example down below of what I mean. First, we’ll attempt to find all prime numbers under 100 using C# as an example language. First in the How:

static bool IsPrime(int i) {
    var lim = Math.Sqrt(i);
    
    Func<int, bool> check = null;
    check = j =>
        j > lim || (i % j != 0 && check(j + 1);
        
    return check(2);
}

static void Main(string[] args) {
    var numbers = Enumerable.Range(1, 100);
    var output = new List<int>();

    foreach(var number in numbers)
        if(IsPrime(number)) output.Add(number);
}

The problem is that if we then want to compose another operation, well, it’s really hard to do inside of these for loops. We’re too focused on the how at this point. Instead, getting to know generics and lazy evaluation, in .NET 2.0 and beyond, we were able to write generic functions to take advantage of some functional constructs. This will give us a more declarative style that we can now focus on the what.

static IEnumerable<T> Filter<T>(
    this IEnumerable<T> items,
    Func<T, bool> predicate) {
    
    foreach(var item in items)
        if(predicate(item))
            yield return item;
}

static void Main(string[] args) {
    var numbers = Enumerable.Range(1, 100);
    
    var primes = items.Filter(x => IsPrime(x));
}

Of course we realize that LINQ already has such constructs built in, so we could rewrite the entire code above in just one statement.

var primes = Enumerable.Range(1, 100)
    .Where(x => IsPrime(x));

And what’s better is that it is composable that we could do other operations such as aggregations (sum, count, etc) without much additional code:

var primesCount = Enumerable.Range(1, 100)
    .Where(x => IsPrime(x))
    .Count();

And there you have it. We not only have a query, but also an aggregation. Try doing that with those non-composable loops! Justin Etheredge has a nice writeup as well recently on the subject.

Coping Strategies

Functional languages tend to deemphasize the use of such constructs. In fact, Haskell has neither a for loop nor a while loop, and languages such as F# and OCaml have limited support for such constructs in terms of no break and continue. We tend to look at those two in particular with suspicion due to the fact that it cannot return a value and instead it mutates state in some fashion. With that in mind, how do we cope with the fact that those aren’t available to us? Above I showed a basic concept of a filter instead of an explicit loop, but what about some other considerations?

Some things we might want to consider with some links to some of my previous posts on the subject:

Explicit Recursion
Transforming each item (Map/Select & SelectMany in LINQ)
Selecting items (Filter/Where in LINQ)
Aggregating Data (Folds/Aggregate in LINQ)

In fact, many times that people could consider using explicit recursion could instead use a fold to aggregate the data which then cuts out the issue of tail call optimization. By truly understanding the goals of LINQ as well as the concepts of functional programming, we can realize that most of the looping that we do can indeed be replaced by the above, outside of side effects of course.

Now What About Those Side Effects?

Is there a place where we draw the line and say that explicit loops are ok? Eric Lippert was recently asked about the reasoning of the lack of the ForEach extension method on IEnumerable<T>. His response was that he was philosophically opposed to such an endeavor as the whole approach is to cause a side effect. As IEnumerable<T> collections are immutable, he doesn’t believe it makes as much sense because you wouldn’t be side effecting the collection itself. Not only that, but introducing closures can complicate object lifetimes and all sorts of potential reference issues.

What about me? I understand his concerns, and in C#, I can certainly see where he is coming from. However, in F# we have such constructs readily available to us in the iter and iteri functions as shown below in F# Interactive:

> let flip f y x = f x y
- [1..10]
-   |> List.map((*) 2)
-   |> List.filter(flip (%) 3 >> (=) 0)
-   |> List.iter(printfn "%d");;
6
12
18
> [1..10]
-   |> List.map((*) 2)
-   |> List.filter(flip (%) 3 >> (=) 0)
-   |> List.iteri(printfn "%d\t%d");;
0       6
1       12
2       18

Outside of logging, writing to the console and such are rare in functional programming, so once again, I can certainly understand the concern.

Conclusion

I hope by going through some basic scenarios that we will indeed question our code the next time we see that we are writing that explicit loop with a for and a while. With a tool chest filled with such functions as transforming every item, to filtering content, to aggregating data and so on, we can realize that we can create composable solutions instead of creating mutable collections or mutable variables and aggregating to them which are not as much. So, come and join me in the “Anti-For Campaign”.

Another good thing about the "splitloop" pattern, is that because of cache effects, there's a good chance that the split loops could overall perform better than the combined ones. Also, each split loop can be done in parallel.

thelazydogsback - Friday, June 26, 2009 4:59:56 PM

|> List.filter(flip (%) 3 >> (=) 0)

*eyes explode*

|> List.filter (fun n -> n % 3 = 0)

Much better! Write code for readability, not to show off how "clever" you ;)

Juliet - Friday, June 26, 2009 6:09:16 PM

@Juliet,

You're right, I was being a little cutesy, and I shouldn't have. Yours is much more readable.

Matt

Matthew Podwysocki - Friday, June 26, 2009 6:25:05 PM

@thelazydog

You are correct that besides readability, there are performance gains to be had. And yes, using such things as the task parallel library to do each at once. But, yes, I agree the caching is important as well.

Matt

Matthew Podwysocki - Friday, June 26, 2009 6:27:18 PM

@kentaromiura,

Yep, LINQ really was a game changing event for those in the .NET land to prove that FP concepts can be powerful in our day to day coding. Good stuff there.

Matt

Matthew Podwysocki - Friday, June 26, 2009 6:28:33 PM

"The problem is that if we then want to compose another operation, well, it’s really hard to do inside of these for loops. We’re too focused on the how at this point. "
Isn't that what the split loop pattern says? -> you can't do more than 1 operation in 1 loop.

ALthough I find it a bit silly to fragment code all over the place to get a series of linq operator calls. What have you gained exactly? Don't forget that the code you put in the extension method is not at the spot of the loop. A person reading your extension method chain doesn't really see what's going on. I'm not saying that logic should be pulled out of classes, on the contrary, what I'm saying is that splitting up routines into tiny snippets might look great at first, but if you have 10,000 loops to split up in a big code base, you end up with a gazillion snippets which are not really re-used simply because you don't know you had them in the first place.

And, if someone is so fond of functional languages, use a functional language, not C#. If you want to use a paradigm in full, use the language which provides that for you, like Haskell.

A last note on performance. It's hard to fine tune performance with linq extension method calls. The reason is that you often will enumerate a lot of times over the same source while in a loop you don't. This might not be serious at first, but it will be if the code is on a critical performance path. (one of the most dangerous in this is the 'Count()' extension method. I think it's obvious why that is ;))

Frans Bouma - Saturday, June 27, 2009 9:29:35 AM

check = j =>
j > lim || (i % j != 0 && check(j + 1);

It would be more helpful if this code could compile :P

Marcel Popescu - Saturday, June 27, 2009 12:51:05 PM

@Frans

Like I said above, you're only really abstracting against three total types of loops, the aggregating, query and side effecting loops. Nothing more than that, so I don't get where you get the impression that there will be 10,000 loops all over the place.

The LINQ operators are fairly well known. I just put in that Filter as an example, and since I didn't introduce any on my own, I once again, don't see what you're getting at. The LINQ operators are fairly common to functional languages and they are pretty well documented such as Select/SelectMany, Where, Aggregate/Sum/Count, Zips and so on.

What makes it hard exactly to read what is going on in a method chain? It should be rather straight forward to say:

collection.Where(x => x % 2 == 0).Select(x => x * 3 + 2);

What's hard about that exactly to understand?

As per optimization, that's where PLINQ can play a role in terms of performance gains (if the collection is large enough of course to warrant such behavior). Why would you be enumerating over it twice versus once with a loop? Wouldn't that entail breaking the split loop pattern?

I agree that discipline is required in terms of not calling Count() and other items to force evaluation before it is necessary.

But, overall, I don't agree that we can't take advantage of this in C#. I think it would be a shame not to apply functional lessons such as this when LINQ has been made available to us.

Matt

podwysocki - Saturday, June 27, 2009 7:00:57 PM

@Marcel,

Woops, should be:

check = j =>

j > lim || (i % j != 0 && check(j + 1));

podwysocki - Saturday, June 27, 2009 7:01:17 PM

the split loop refactoring doesn't always play well with laziness, though - it's very easy to get a space leak by traversing a lazy list twice.

Mark Wotton - Sunday, October 18, 2009 10:30:54 PM

What and Why?

Coping Strategies

Now What About Those Side Effects?

Conclusion

10 Comments