F# 1.9.4.19 runs out of the box with Mono in Linux

Don Syme just announced a minor update to the F# environment, minor may be but of great interest to a certain community: it so happens that at some point F# stopped working properly in Linux, a workaround was published (and it actually works, but you've got to follow the instructions carefully). Well, not anymore, 1.9.4.19 works out of the box with Mono in Linux, you just have to download it, unzip it, and then happily type "mono fsi.exe":

FSharpMonoLinux

So now I've got one less pretext for not writing that book "Learning to program the functional way in an open source environment using a cool Microsoft technology" that will make me famous...

Cloning objects in .NET

In an interesting project where I'm giving a hand, we need to clone objects of a number of different types, perhaps surprisingly the CLR doesn't offer a general cloning method, of course you could use MemberwiseClone() but this is a protected method, so it can be invoked only from inside the class of the object being cloned, which makes it difficult to use it in a general method, besides, MemberWiseClone() does just a shallow copy and what we really need is a deep copy.

There is a good reason for not having such a general method: object cloning is one of those problems which have a simple solution for simple scenarios but that resist a satisfactory solution for all the scenarios, for example the objects may have references to other objects and even to themselves be it directly or after a long chain, for example a customer has invoices that have payments that refer to the customer, a general cloning algorithm for a web several times more complex than that is anything but trivial. But the need of moving objects (and their web) is inescapable in distributed environments, because you have to move the invoices and the customers from the business layer to the presentation layer, so there are indeed mechanisms, albeit with some limitations, for serializing and deserializing object graphs, with the help of these mechanisms we can try and build a general object cloner:

    1 

    2     public static T BinaryClone<T>(this T originalObject)

    3     {

    4         using (var stream = new System.IO.MemoryStream())

    5         {

    6             var binaryFormatter = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();

    7             binaryFormatter.Serialize(stream, originalObject);

    8             stream.Position = 0;

    9             return (T)binaryFormatter.Deserialize(stream);

   10         }

   11     }

   12 

We just convert, using as intermediary a memory stream, our object to a bit string and then we synthesize a *new object* from that bit string. The use of BinaryClone() flows nicely:

    1 

    2     var clone = person.BinaryClone();

    3 

Easy to write, but we have to be careful of the performance and the corner cases. I run a few *very informal* performance tests with this object:

    1 

    2     var person = new Person

    3     {

    4         Id = 101,

    5         Name = "Sánchez, Sebastián",

    6         Salary = 590.20m,

    7         BirthDate = new DateTime(2000, 6, 15),

    8         Address = new Address { Number = "N24-78", Street = "Pasaje Córdova" }

    9     };

   10 

And I found that doing a hundred thousand clones takes some nine thousand milliseconds, that is each cloning takes less than one ten-thousandth of a second, which is adequate to our needs. BinaryFormatter has been with us since .NET Framework 1.0 so I'm not like saying anything even remotely new or unknown, but tomorrow (or the day after, or the day after...) I'll talk about a small refinement to BinaryClone().

Posted by Edgar Sánchez with 4 comment(s)
Filed under: ,

A cool way to find out whether a number is palindromic

In this blog entry I proposed a solution to Problem 4 at Project Euler, a crucial element of the problem is to find out whether a number is a palindrome (909 is, 809 isn't), a bit out of laziness and a bit in order to reuse existing methods, I decided to verify the palindrome by converting the number into a char array and then comparing this array with its mirror version, it works but it's not really that mathematical... Dustin Campbell proposed a solution kind of similar to mine (alas, more elegant and, above that, in F#) and using the same trick of converting the number to chars, as he didn't like this part of the solution, in this new blog entry he proposes the detection of a palindrome by mirroring the number one digit at a time. A translation of his F# code to C# 3.0 could be:

    1     Func<int, int, int> TailReverseNumber = null;

    2     TailReverseNumber = (n, res) => n == 0 ? res : TailReverseNumber(n / 10, 10 * res + n % 10);

    3 

    4     Func<int, bool> IsPalindrome = n => n == TailReverseNumber(n, 0);

TailReverseNumber takes a number n and "mirrors" it, one digit at a time, for example: TailReverseNumber(237, 0) -> TailReverseNumber(23, 7) -> TailReverseNumber(2, 73) -> TailReverseNumber(0, 732), the big trick is that res works as an accumulator that is multiplied by 10, therefore moving its value to the left, and putting in the "hole" that appears at the right the least significant digit of n. As it names implies, TailReverseNumber() uses tail recursion, so that no extra call stack space is used to save any intermediate results, which makes the process pretty efficient. Actually, in my PC it's four times faster than the initial solution. More efficient and more elegant, a smart guy Dustin.

New version of F# just released

In its way from research language to commercial language, Don Syme just announced that, silently, on May the 1st version 1.9.4.15 of F# was released.

 F# 1.9.4.15

This new version incorporates a number of specific enhancements (F# is now basically in stabilization mode, so we really shouldn't expect significative changes to the language). By the way, the example in the picture shows my idea for solving Problema 2 of Project  Euler: find the sum of all the even-valued terms in the Fibonacci sequence which do not exceed four million. For an explanation of how it works, I suggest you this Dustin Campbell blog, which proposes a solution similar to mine with a couple of elegant touches and, above all, with a detailed and engaging explanation.

Which is the ten thousand first prime?

Prime numbers have a good deal of practical applications (for example in cryptography) but let's face it, even if they would have none, they would still be the favorite toy of mathematicians. In Problem 7 of Project Euler, we are asked to find the 10001st element of the famous list, my approach was this:

  1. Define the *infinite* sequence of the prime numbers
  2. From this sequence, throw away the first 10000 items and then take the first of the remaining

Creating an infinite sequence in C# is easy (since version 2) thanks to IEnumerables and, above all, the yield statement:

    1 IEnumerable<int> Primes()

    2 {

    3     yield return 2;

    4 

    5     var primesSoFar = new List<int>();

    6     primesSoFar.Add(2);

    7 

    8     Func<int, bool> IsPrime = n => primesSoFar.TakeWhile(p => p <= (int)Math.Sqrt(n)).FirstOrDefault(p => n % p == 0) == 0;

    9 

   10     for (int i = 3; true; i += 2)

   11     {

   12         if (IsPrime(i))

   13         {

   14             yield return i;

   15             primesSoFar.Add(i);

   16         }

   17     }

   18 }

The yield at line 3 returns the first item of the sequence: the always excepcional 2 (the only even prime). Then at line 5 we create a list where we will be saving the generated primes as we progress in the sequence (this way we will gain speed at the cost of memory). The IsPrime(n) function defined at line 9, proposes a method -pretty crude actually- of verifying whether a number is prime: we take all primes generated so far which are lower or equal than the square root of n, and we look for the first among them that divides n evenly, if such a divisor exists then n is not a prime, that is: if none of the primes already generated is an exact divisor of n then FirstOrDefault() returns a 0, signaling the fact that n is indeed a prime. Finally at line 10 starts the loop that, every time the Prime() invoker asks for an item, it progresses thru 3, 5, 7, 9, 11, 13, ... stopping and returning, thru yield, when it finds a new prime.

With this sequence in our hands, step 2 of my plan is utterly simple:

return Primes().Skip(nth - 1).First();

We take the Primes() sequence, ignore the first nth - 1 (in our case nth = 100001) and then take the first of the remaining. This little code returns the answer in far less than a second.

The square of the sum vs. the sum of the squares

The sixth Project Euler problem poses an exercise that, to me, offers no major hurdles:

What is the difference between the sum of the squares and the square of the sums [of a sequence of natural numbers]?

The functional C# solution is fairly easy to write and read:

    1     public int SquareSumsLessSumSquares(IEnumerable<int> sequence)

    2     {

    3         var sum = sequence.Sum();

    4         var sumSquares = sequence.Select(i => i * i).Sum();

    5         return sum * sum - sumSquares;

    6     }

We get any sequence of integers (list, array, etc.) and we find the sum of its elements at line 3. The Select() in line 4 generates a new sequence with the square of each item from the original sequence and then we add them up. From there, getting the answer to Problem 6 is easy. Actually, the specific value that the problem asks for is to find out this difference for the first 100 natural numbers, so the corresponding call is:

    var result006 = SquareSumsLessSumSquares(Enumerable.Range(1, 100));

Now, given that the sequence we are focused on is 1, 2, 3, ..., 100, we can leverage a couple of math formulas:

 

y

 

Isn't the Wikipedia something? With this information in our hands we can re-write our solution this way:

    1     public int SquareSumsLessSumSquaresFirstNumbers(int n)

    2     {

    3         var sum = n * (n + 1) / 2;

    4         var sumSquares = n * (n + 1) * (2 * n + 1) / 6;

    5         return sum * sum - sumSquares;

    6     }

Clearly this latter way of solving the problem uses far less memory and CPU time. My first solution follows what is called a "brute force" approach, contemption intended. It works, for sure, but there are more elegant and efficient ways of solving the problem. The moral of this story is that a little research can take us a long way towards fresh solutions, probably better than our first approach. How frequently have you seen brute force solutions in production environments?

Recursive lambdas and sequence aggregations

The fifth problem at Project Euler proposes this nostalgic primary school exercise: find the smallest quantity that is evenly divisible by some numbers, the least common multiple of 1, 2, 3, ..., 20 to be precise. To begin with, let's remember the old arithmetic formula:

 

Where gcd is the greatest common divisor of a and b. It's also useful to remember that the lcm is associative, that is lcm(1,2,3) is the same as lcm(lcm(1,2), 3), in other words, we can calculate the lcm of 1 and 2, then the lcm of this result and 3, then the lcm of this result and 4, and so on. In functional C#, these ideas can be expressed like so:

    1     Func<int, int, int> gcd = null;

    2     gcd = (x, y) => x % y == 0 ? y : gcd(y, x % y);

    3 

    4     return Enumerable.Range(1, 20).Aggregate(1, (x,y) => x * (y / gcd(x, y)));

Line 2 finds the gcd of x and y like this: if the remainder of dividing x by y is zero, then y is the gcd, if not, we must find the gcd of y and such remainder. Note that gcd is defined recursively. The only interesting point here is that in order to define a recursive lambda expression, it is mandatory to first declare the expression variable (which we do in line 1), and only after this the compiler allows us to define a lambda expression in terms of itself. Odd, granted, but easy to get used to.

Line 4 takes the sequence 1, 2, ..., 20 and accumulates the lcm as the sequence is traversed, that is: it starts calculating lcm(1, 1) (let's call this accum1), then it calculates lcm(accum1, 2) (let's call this accum2), then it calculates lcm(accum2, 3), ..., and finishes with lcm(accum19, 20) which is precisely lcm(1, 2, ..., 20). The Aggregate() function is frequently used for chores like adding each term of a sequence, or multiplying them all, in our case we are using Aggregate() to get the succesive lcm's, which answers the question of Project Euler in two and a half lines of code, cool, eh?

Nested sequences and palindrome numbers

Problem 4 of Project Euler poses and impractical albeit intriguing problem: given all three digit numbers (100, 101, 102, ..., 998, 999), find the largest product of 2 of those numbers, where the product is a palindrome. For example, 580.085 is the product of two three-digit numbers (995 x 583) but it isn't the largest of such products. One possible functional C# solution is:

    1     Func<int,bool> IsPalindrome = n => { var nchar = n.ToString().ToCharArray(); return nchar.SequenceEqual(nchar.Reverse()); };

    2 

    3     return Enumerable.Range(100, 900)

    4         .SelectMany(i => Enumerable.Range(i, 900 - (i - 100)).Select(j => i * j))

    5         .Where (p => IsPalindrome(p))

    6         .Max();

To begin with, let's focus on the interesting part: at line 3, Enumerable.Range() generates the sequence 100, 101, ..., 998, 999; SelectMany() at line 4 does several things, first for every number i from the previous line it generates the sequence i, i + 1, i + 2, ..., 998, 999, and then the inner Select() generates the products {100 x 100, 100 x 101, ..., 100 x 999}, {101 x 101, 101 x 102, ..., 101 x 999 }, ... As you can see, there are 999 sequences generated (are all the same size?) finally SelectMany() kindly "flattens" those 999 sequences in one large sequence containing each and every product; line 5 then is easy, the Where() just keeps those products that are palindromes; the Max() in line 6 isn't worth any explanation.

Back to line 1, I took the lazy path to check whether a number is a palindrome: I just converted it to a string and then to a char array, finally I checked whether that array is equal, item by item, to its reversed version (and all that work just to leverage IEnumerable.Reverse() ;-)

Finding the largest prime factor of a number

This is another classic problem at Project Euler that can be solved with the old trick of top down programming, like so:

     PrimeFactors(number).DefaultIfEmpty(number).Max();

It's a nice solution, supposing PrimeFactors() actually returns all prime factors of any given number. But before getting there, it's a good idea to discuss what DefaultIfEmpty() is doing there. The thing is that somebody could make this innocent call: PrimeFactors(1), which of course returns an empty collection, which in turn makes PrimeFactors(1).Max() throw an exception (trying to get the maximum out of nothing, eh?) DefaultIfEmpty(x) to the rescue: when DefaultIfEmpty(x) is applied to a "reasonable" collection it returns it untouched, but when it's applied to an empty collection it returns a new collection with just un item: x. So, in our case, PrimeFactors(1).DefaultIfEmpty(1).Max() is just a looooong alias for 1. DefaultIfEmpty() is very useful for those special cases with empty collections.

With that behind, let's see how we get all the prime factors of a number:

    1     IEnumerable<ulong> PrimeFactors(ulong number)

    2     {

    3         Func<ulong, ulong> FindDivisor = n => n == 1UL ? 1UL : CandidateFactors(n).First(c => n % c == 0);

    4 

    5         var factors = new HashSet<ulong>();

    6         while (number > 1UL)

    7         {

    8             ulong divisor = FindDivisor(number);

    9             factors.Add(divisor);

   10             number /= divisor;

   11         }

   12 

   13         return factors;

   14     }

At line 3 we define a function (we are using lambdas just for fun) that finds a divisor for a number, for example FindDivisor(45) returns 3 and FindDivisor(49) returns 7. The workhorse is the while loop starting on line 6, it starts with a number, let's say 45, find the divisor 3, saves it in factors and goes on with 15 (45 / 3), now it finds a divisor for 15, in this case 3 again, saves it in factors and goes on with the 5 (15 / 3), and so on and so on until all is left is a 1. Meanwhile factors has been accumulating 3, 5, etc.: the prime factors of the original number. An interesting detail is that factors is a set, so it doesn't have any duplicates.

Finally let's check the CandidateFactors() definition:

    1     IEnumerable<ulong> CandidateFactors(ulong n)

    2     {

    3         yield return 2UL;

    4         for (ulong candidate = 3UL; candidate <= n; candidate += 2UL)

    5             yield return candidate;

    6     }

As you can see, this function simply produces the sequence 2, 3, 5, 7, 9, ..., n which are the possible divisors of n. You can think of a more clever implementation, one that produces a less indiscriminate list of candidates, but let's leave that for another day.

And in this way, this mixture of LINQ (DefaultIsEmpty(), Max(), First()), lambdas (FindDivisor), iterators (yield) and good old imperative programming, leads us to a compact and effective solution to an old problem. Do you happen to have a more elegant solution?

Project Euler and infinite sequences in C#

The second problem at Project Euler proposes:

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...

Find the sum of all the even-valued terms in the sequence which do not exceed four million.

One way of solving this problem is by defining an endless sequence in C#. Something along these lines:

    IEnumerable<int> FibonacciSequence()

    {

        int n1 = 0;

        int n2 = 1;

 

        while (true)

        {

            int n3 = n1 + n2;

            yield return n3;

            n1 = n2;

            n2 = n3;

        }

    }

The FibonacciSequence() function returns the next value in the sequence everytime its execution hits the yield statement, and conveniently enough, n3 gets the values 1, 2, 3, 5, 8, ... in each next round of the loop. It's notable that the loop never finishes by itself, this responsability actually belongs to the code invoking FibonacciSequence(). So in fact we are seeing a way of representing an infinite set (all Fibonacci numbers) in C#. In our case we need all elements no bigger than 4'000.000, so we can write something like this:

FibonacciSequence().TakeWhile(n => n <= 4000000).Where(n => n % 2 == 0).Sum()

The TakeWhile() method stops asking for more terms as soon as an item bigger than four millions is produced, afterwards the Where() method leaves us with a sequence containing only the even terms, and these terms are finally added up by Sum().

More Posts « Previous page - Next page »