April 2008 - Posts

The square of the sum vs. the sum of the squares

The sixth Project Euler problem poses an exercise that, to me, offers no major hurdles:

What is the difference between the sum of the squares and the square of the sums [of a sequence of natural numbers]?

The functional C# solution is fairly easy to write and read:

    1     public int SquareSumsLessSumSquares(IEnumerable<int> sequence)

    2     {

    3         var sum = sequence.Sum();

    4         var sumSquares = sequence.Select(i => i * i).Sum();

    5         return sum * sum - sumSquares;

    6     }

We get any sequence of integers (list, array, etc.) and we find the sum of its elements at line 3. The Select() in line 4 generates a new sequence with the square of each item from the original sequence and then we add them up. From there, getting the answer to Problem 6 is easy. Actually, the specific value that the problem asks for is to find out this difference for the first 100 natural numbers, so the corresponding call is:

    var result006 = SquareSumsLessSumSquares(Enumerable.Range(1, 100));

Now, given that the sequence we are focused on is 1, 2, 3, ..., 100, we can leverage a couple of math formulas:

 

y

 

Isn't the Wikipedia something? With this information in our hands we can re-write our solution this way:

    1     public int SquareSumsLessSumSquaresFirstNumbers(int n)

    2     {

    3         var sum = n * (n + 1) / 2;

    4         var sumSquares = n * (n + 1) * (2 * n + 1) / 6;

    5         return sum * sum - sumSquares;

    6     }

Clearly this latter way of solving the problem uses far less memory and CPU time. My first solution follows what is called a "brute force" approach, contemption intended. It works, for sure, but there are more elegant and efficient ways of solving the problem. The moral of this story is that a little research can take us a long way towards fresh solutions, probably better than our first approach. How frequently have you seen brute force solutions in production environments?

Recursive lambdas and sequence aggregations

The fifth problem at Project Euler proposes this nostalgic primary school exercise: find the smallest quantity that is evenly divisible by some numbers, the least common multiple of 1, 2, 3, ..., 20 to be precise. To begin with, let's remember the old arithmetic formula:

 

Where gcd is the greatest common divisor of a and b. It's also useful to remember that the lcm is associative, that is lcm(1,2,3) is the same as lcm(lcm(1,2), 3), in other words, we can calculate the lcm of 1 and 2, then the lcm of this result and 3, then the lcm of this result and 4, and so on. In functional C#, these ideas can be expressed like so:

    1     Func<int, int, int> gcd = null;

    2     gcd = (x, y) => x % y == 0 ? y : gcd(y, x % y);

    3 

    4     return Enumerable.Range(1, 20).Aggregate(1, (x,y) => x * (y / gcd(x, y)));

Line 2 finds the gcd of x and y like this: if the remainder of dividing x by y is zero, then y is the gcd, if not, we must find the gcd of y and such remainder. Note that gcd is defined recursively. The only interesting point here is that in order to define a recursive lambda expression, it is mandatory to first declare the expression variable (which we do in line 1), and only after this the compiler allows us to define a lambda expression in terms of itself. Odd, granted, but easy to get used to.

Line 4 takes the sequence 1, 2, ..., 20 and accumulates the lcm as the sequence is traversed, that is: it starts calculating lcm(1, 1) (let's call this accum1), then it calculates lcm(accum1, 2) (let's call this accum2), then it calculates lcm(accum2, 3), ..., and finishes with lcm(accum19, 20) which is precisely lcm(1, 2, ..., 20). The Aggregate() function is frequently used for chores like adding each term of a sequence, or multiplying them all, in our case we are using Aggregate() to get the succesive lcm's, which answers the question of Project Euler in two and a half lines of code, cool, eh?

Nested sequences and palindrome numbers

Problem 4 of Project Euler poses and impractical albeit intriguing problem: given all three digit numbers (100, 101, 102, ..., 998, 999), find the largest product of 2 of those numbers, where the product is a palindrome. For example, 580.085 is the product of two three-digit numbers (995 x 583) but it isn't the largest of such products. One possible functional C# solution is:

    1     Func<int,bool> IsPalindrome = n => { var nchar = n.ToString().ToCharArray(); return nchar.SequenceEqual(nchar.Reverse()); };

    2 

    3     return Enumerable.Range(100, 900)

    4         .SelectMany(i => Enumerable.Range(i, 900 - (i - 100)).Select(j => i * j))

    5         .Where (p => IsPalindrome(p))

    6         .Max();

To begin with, let's focus on the interesting part: at line 3, Enumerable.Range() generates the sequence 100, 101, ..., 998, 999; SelectMany() at line 4 does several things, first for every number i from the previous line it generates the sequence i, i + 1, i + 2, ..., 998, 999, and then the inner Select() generates the products {100 x 100, 100 x 101, ..., 100 x 999}, {101 x 101, 101 x 102, ..., 101 x 999 }, ... As you can see, there are 999 sequences generated (are all the same size?) finally SelectMany() kindly "flattens" those 999 sequences in one large sequence containing each and every product; line 5 then is easy, the Where() just keeps those products that are palindromes; the Max() in line 6 isn't worth any explanation.

Back to line 1, I took the lazy path to check whether a number is a palindrome: I just converted it to a string and then to a char array, finally I checked whether that array is equal, item by item, to its reversed version (and all that work just to leverage IEnumerable.Reverse() ;-)

Finding the largest prime factor of a number

This is another classic problem at Project Euler that can be solved with the old trick of top down programming, like so:

     PrimeFactors(number).DefaultIfEmpty(number).Max();

It's a nice solution, supposing PrimeFactors() actually returns all prime factors of any given number. But before getting there, it's a good idea to discuss what DefaultIfEmpty() is doing there. The thing is that somebody could make this innocent call: PrimeFactors(1), which of course returns an empty collection, which in turn makes PrimeFactors(1).Max() throw an exception (trying to get the maximum out of nothing, eh?) DefaultIfEmpty(x) to the rescue: when DefaultIfEmpty(x) is applied to a "reasonable" collection it returns it untouched, but when it's applied to an empty collection it returns a new collection with just un item: x. So, in our case, PrimeFactors(1).DefaultIfEmpty(1).Max() is just a looooong alias for 1. DefaultIfEmpty() is very useful for those special cases with empty collections.

With that behind, let's see how we get all the prime factors of a number:

    1     IEnumerable<ulong> PrimeFactors(ulong number)

    2     {

    3         Func<ulong, ulong> FindDivisor = n => n == 1UL ? 1UL : CandidateFactors(n).First(c => n % c == 0);

    4 

    5         var factors = new HashSet<ulong>();

    6         while (number > 1UL)

    7         {

    8             ulong divisor = FindDivisor(number);

    9             factors.Add(divisor);

   10             number /= divisor;

   11         }

   12 

   13         return factors;

   14     }

At line 3 we define a function (we are using lambdas just for fun) that finds a divisor for a number, for example FindDivisor(45) returns 3 and FindDivisor(49) returns 7. The workhorse is the while loop starting on line 6, it starts with a number, let's say 45, find the divisor 3, saves it in factors and goes on with 15 (45 / 3), now it finds a divisor for 15, in this case 3 again, saves it in factors and goes on with the 5 (15 / 3), and so on and so on until all is left is a 1. Meanwhile factors has been accumulating 3, 5, etc.: the prime factors of the original number. An interesting detail is that factors is a set, so it doesn't have any duplicates.

Finally let's check the CandidateFactors() definition:

    1     IEnumerable<ulong> CandidateFactors(ulong n)

    2     {

    3         yield return 2UL;

    4         for (ulong candidate = 3UL; candidate <= n; candidate += 2UL)

    5             yield return candidate;

    6     }

As you can see, this function simply produces the sequence 2, 3, 5, 7, 9, ..., n which are the possible divisors of n. You can think of a more clever implementation, one that produces a less indiscriminate list of candidates, but let's leave that for another day.

And in this way, this mixture of LINQ (DefaultIsEmpty(), Max(), First()), lambdas (FindDivisor), iterators (yield) and good old imperative programming, leads us to a compact and effective solution to an old problem. Do you happen to have a more elegant solution?

Project Euler and infinite sequences in C#

The second problem at Project Euler proposes:

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

Find the sum of all the even-valued terms in the sequence which do not exceed four million.

One way of solving this problem is by defining an endless sequence in C#. Something along these lines:

    IEnumerable<int> FibonacciSequence()

    {

        int n1 = 0;

        int n2 = 1;

 

        while (true)

        {

            int n3 = n1 + n2;

            yield return n3;

            n1 = n2;

            n2 = n3;

        }

    }

The FibonacciSequence() function returns the next value in the sequence everytime its execution hits the yield statement, and conveniently enough, n3 gets the values 1, 2, 3, 5, 8, ... in each next round of the loop. It's notable that the loop never finishes by itself, this responsability actually belongs to the code invoking FibonacciSequence(). So in fact we are seeing a way of representing an infinite set (all Fibonacci numbers) in C#. In our case we need all elements no bigger than 4'000.000, so we can write something like this:

FibonacciSequence().TakeWhile(n => n <= 4000000).Where(n => n % 2 == 0).Sum()

The TakeWhile() method stops asking for more terms as soon as an item bigger than four millions is produced, afterwards the Where() method leaves us with a sequence containing only the even terms, and these terms are finally added up by Sum().

Project Euler and functional C#

I think I already talked (a long time ago) about Project Euler: a set of problems, mainly math oriented, keen to be solved with a computer program. To be sure, it's not particularly deep math, but the problems posed go fairly quickly from really easy, to hmm, to challenging. In this sense, they are good programming calisthenics, and may also be a good way of learning a new language. For example, Chris Smith has solved some Project Euler problems using F#, I find the idea intriguing, so I'll try to do the same thing, only that I'll be using C#, but not your plain old C# mind you, as I'll do my best to use in as much as possible the functional facilities of C# 3.0. Let's see for example Problem 1:

Add all the natural numbers below one thousand that are multiples of 3 or 5.

Ok, ok, the problem is silly trivial, but wait until you get to Problem 10 ;-). One C# 3.0 functional solution is:

    Enumerable.Range(1, 999).Where(n => n % 3 == 0 || n % 5 == 0).Sum()

Range() generates the sequence from 1 to 999, and then Where() filters it leaving just the numbers multiples of 3 or 5, finally Sum() adds up these numbers. Easy, isn't it? And without loops or ifs.

Obviously, not all problems lend so naturally to be solved in a functional way, and we'll find out step by step how useful is the functional paradigm in Project Euler.

More Posts