Distance between adjacent points in F#

Let’s say you are given a list of data points:

[7;5;12;8;5]

And you are asked to find the distance between every adjacent pair, that is:

[(7-5);(5-12);(12-8);(8-5)] = [2;-7;4;3]

It turns out that there is an elegant solution to this problem:

let rec pairs = function
  | h1 :: (h2 :: _ as tail) -> (h1, h2) :: pairs tail
  | _ –> []


let distances dataPoints =
  dataPoints |> pairs |> List.map (fun (a, b) -> a - b)

The magic happens in the pairs function, this function takes [7;5;12;8;5] and turns it into [(7,5);(5,12);(12,8);(8,5)], that is, it creates a tuple with every member of the list and its right neighbor. The trick is that tail gets bounded  to ( h2 :: _ ), so that the recursive call processes the list starting with h2. This is called a named subpattern, something I discovered in section 1.4.2.2 of F# for Scientists. You learn at least a nice thing every day!

Posted by Edgar Sánchez with 6 comment(s)

Entity Framework, LINQ to SQL and Oracle

Amid the debate about which is better and have more future (two things that not necessarily go together) between LINQ to SQL and Entity Framework, one thing they have in common is the fact that Oracle is in “no comment” mode about both of them. It’s like Oracle would be expecting that the lack of its “official” provider for Entity Framework, let alone LINQ to SQL, would somehow move people to develop in Java instead of .NET Framework. IMHO, Visual Studio 2008 is so productive that people may first consider moving from Oracle to SQL Server before moving from VS 2008 to JDeveloper.

 

 image

Luckily, .NET Framework has a big ecosystem of developers and ISV’s: enter Devart, a software house in Russia or Ukraine –I’m not sure. They’ve been offering for a while now an Entity Framework provider for Oracle, I have had the chance to use it with Oracle 10g with good success. The good news is that a few days ago they released new versions of all of its providers (changing their names while at it), including dotConnect for Oracle 5.00. Even more intriguing, this new version includes a LINQ to SQL provider for Oracle, something supposedly so complex to do that it would have taken a long time before it even existed. To be fair, I haven’t already used this last provider, but the very fact that it’s available is exciting. Now Oracle friendly Visual Studio 2008 developers (no, that’s not an oxymoron at all) has two good paths to follow. Let the debate begin!

Free F# libraries (well, almost)

In what was one of the very last PDC2008 sessions, Luca Bolognese did an encore presentation of F#, instead of trying to tell you what it was all about I invite you to watch the video (Luca is engaging and funny, and the session is so packed with information that one our will pass in no time). What I wanted to do is to talk about a couple of very interesting libraries, all written in F#, that Luca used in his demos:

F# for Numerics offers a bunch of numerical analysis functions, things like matrix operations, integration and differentiation, statistical methods, maximization and minimization, Fourier transforms, you know, the stuff we all love about maths [:P].

F# for Visualization allows us to visualize functions in 2D as well as 3D, including animations and PNG export. Believe me, the graphs really look good.

If you are hesitant about this, Jon Harrop, the man behind the libraries is offering free licenses of both libraries, well, with the usual banners and watermarks reminding you that you should really buy the real thing. Not that they are expensive either: you can get both by around US$ 100.

Personally, I feel a really sweet smell from the very fact that these libraries exist: a great symptom of a language or technology readiness for the market is that libraries from third parties start to appear (as, for example, has happened in the last months with WPF, but that’s the matter for another blog entry…)

While we are on the subject, and for those of you who are really intrigued by scientic applications, I strongly suggest you to take a look to F# for Scientists, the book where Jon tells us how to use F# in this field.

Posted by Edgar Sánchez with 1 comment(s)

Point distance, imperative vs. functional style

Let’s consider a silly simple algebra problem: given a specific point and a set of several other points, find the closest point in the set to the given point. One C# solution is:

    1     static class PointMath

    2     {

    3         static double Distance(Point p1, Point p2)

    4         {

    5             double xDist = p1.X - p2.X;

    6             double yDist = p1.Y - p2.Y;

    7             return Math.Sqrt(xDist * xDist + yDist * yDist);

    8         }

    9 

   10         public static Point ClosestPoint(Point p, IList<Point> points)

   11         {

   12             double shortestDistance = Double.PositiveInfinity;

   13             Point closestPoint = null;

   14 

   15             foreach (var point in points)

   16             {

   17                 double distance = Distance(p, point);

   18                 if (distance < shortestDistance)

   19                 {

   20                     shortestDistance = distance;

   21                     closestPoint = point;

   22                 }

   23             }

   24 

   25             return closestPoint;

   26         }

   27     }

The Distance() function finds the distance between two points with good ol’ Pythagoras, the ClosestPoint() function does the classic loop: traverse the points list and calculate every distance, if you find a smaller one, keep it and the also keep the current point, at the end return the last point you kept. Easy, but with declarations, curly braces and whatnot, the solution takes 27 lines, OK, 14 lines if we ignore the blank lines and the curly-braces-only lines. And we didn’t even show the Point class definition… Can we do any better?

What about this F# solution:

    1 let distance (x1, y1) (x2, y2) : double =

    2   let xDistance = x1 - x2

    3   let yDistance = y1 - y2

    4   sqrt (xDistance * xDistance + yDistance * yDistance)

    5 

    6 let closestPoint toPoint fromPoints = List.min_by (distance toPoint) fromPoints

 

The distance function is almost a clone of its C# cousin, the sexy one is the closestPoint function: just one line! Let’s try to dissect it a little bit:

  1. The List.min_by function expects two parameters: the last one is the list from where the minimum will be picked, the first one is the function that will be used to compare the items
  2. As I said, distance expects two parameters, but we are providing just one (toPoint), what are we accomplishing with this? Well, to begin with, the type of the distance function is (my apologies for the relaxed syntax) Point x Point –> double (i.e. distance takes two points and returns one double). If we fix one of those parameters (for example saying distance (0.5, 0.5) ), what we are actually doing is to define a new function of type Point –> double, this new function only knows how to calculate distances to the point (0.5, 0.5). In our case, in line 6 we have created a function that knows how to calculate the distances to toPoint
  3. So, List_min_by finds the point in fromPoint closest to toPoint, calculate every distance to toPoint and keeping the fromPoints member with the shortest distance

I can hear some of you complaining about the fact that closestPoint may have taken just one line of code, but it took seven lines of explanation, but this is mainly because we are not used to the language, once you have familiarity with F#, the meaning of line 6 comes quite naturally. Any small imperative problem that you would like to see solved in a functional style?

Posted by Edgar Sánchez with 14 comment(s)

The first Visual F# CTP is here!

You leave on vacation for one short week and a lot happens... for example, Don Syme & co. have released the first F# CTP, well on the way (hopefully before this year's end) to put F# on the same level as C#, C++ or VB.NET. As far as I know, this will be an historical event: for the first time a mainstream platform (commercial or otherwise) wholly adopts a functional language. Allow me to seize the occasion to reiterate that there are several reasons for the functional programming paradigm to be considered interesting important, IMHO the most relevant are:

  1. The usage of high level functions and lazy evaluation allows us to reach higher levels of abstraction and modularization, which eases the programming of the ever more complex problems that we face.
  2. The extensive use of immutable values and structures greatly paves the way to code execution parallelization, which is especially relevant in today's multi-core CPU world.
  3. Functional thinking adapts particularly well to the solution of math problems (be them symbolic or numeric). Well, this last point may not have as broad reach as the former, but it is especially fascinating and useful for mathematicians and scientists (which in more than a few cases have had to stick to FORTRAN or use very specialized products like Matlab or Mathematica).

For reasons like these, functional programming has steadily invaded the programming scenario, for example:

  1. C# 3.0 has lambda and higher level functions (or a subset, at least, sort of...) and lazy evaluation (LINQ anyone?)
  2. In the Java world, people is starting to consider Scala, a functional language for the JVM
  3. There's renewed interest in specialty languages like Erlang (used at Ericsson to forge extremely scalable and reliable systems)
  4. F# will be a first class citizen in the .NET Framework world

An example of the F# September CTP running on my Visual Studio 2008 SP1:

FSharpSeptemberCTP

If anybody is wondering what this code does, fibonacciSequence generates the Fibonacci numbers up to a given maximum, and the second function adds the even terms of the sequence up to a given limit. It's a succinct solution for Problem 2 @ Project Euler (and it would be interesting for you to try and solve it in your favorite language). It's all quechua to you? Well, that's just a matter of getting to know the language :-), IMHO the best way to learn F# is following the Expert F# book, co-authored by the language's father himself. Furthermore, Microsoft has put online the language official site, the F# Developer Center, where you will find several other resources.

By the way, it seems like the official name of the release will be Visual F# 1.0 (which actually corresponds to Version 2, counting from its inception at Microsoft Research). Finally, Visual F# requires only .NET 2.0, and an intriguing consequence of this is that you will be able to run F# code on Linux, thanks to the Mono  project, that is, you will have an open source functional programming platform, courtesy from Microsoft (and Novell).

F# 1.9.4.19 runs out of the box with Mono in Linux

Don Syme just announced a minor update to the F# environment, minor may be but of great interest to a certain community: it so happens that at some point F# stopped working properly in Linux, a workaround was published (and it actually works, but you've got to follow the instructions carefully). Well, not anymore, 1.9.4.19 works out of the box with Mono in Linux, you just have to download it, unzip it, and then happily type "mono fsi.exe":

FSharpMonoLinux

So now I've got one less pretext for not writing that book "Learning to program the functional way in an open source environment using a cool Microsoft technology" that will make me famous...

Cloning objects in .NET

In an interesting project where I'm giving a hand, we need to clone objects of a number of different types, perhaps surprisingly the CLR doesn't offer a general cloning method, of course you could use MemberwiseClone() but this is a protected method, so it can be invoked only from inside the class of the object being cloned, which makes it difficult to use it in a general method, besides, MemberWiseClone() does just a shallow copy and what we really need is a deep copy.

There is a good reason for not having such a general method: object cloning is one of those problems which have a simple solution for simple scenarios but that resist a satisfactory solution for all the scenarios, for example the objects may have references to other objects and even to themselves be it directly or after a long chain, for example a customer has invoices that have payments that refer to the customer, a general cloning algorithm for a web several times more complex than that is anything but trivial. But the need of moving objects (and their web) is inescapable in distributed environments, because you have to move the invoices and the customers from the business layer to the presentation layer, so there are indeed mechanisms, albeit with some limitations, for serializing and deserializing object graphs, with the help of these mechanisms we can try and build a general object cloner:

    1 

    2     public static T BinaryClone<T>(this T originalObject)

    3     {

    4         using (var stream = new System.IO.MemoryStream())

    5         {

    6             var binaryFormatter = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();

    7             binaryFormatter.Serialize(stream, originalObject);

    8             stream.Position = 0;

    9             return (T)binaryFormatter.Deserialize(stream);

   10         }

   11     }

   12 

We just convert, using as intermediary a memory stream, our object to a bit string and then we synthesize a *new object* from that bit string. The use of BinaryClone() flows nicely:

    1 

    2     var clone = person.BinaryClone();

    3 

Easy to write, but we have to be careful of the performance and the corner cases. I run a few *very informal* performance tests with this object:

    1 

    2     var person = new Person

    3     {

    4         Id = 101,

    5         Name = "Sánchez, Sebastián",

    6         Salary = 590.20m,

    7         BirthDate = new DateTime(2000, 6, 15),

    8         Address = new Address { Number = "N24-78", Street = "Pasaje Córdova" }

    9     };

   10 

And I found that doing a hundred thousand clones takes some nine thousand milliseconds, that is each cloning takes less than one ten-thousandth of a second, which is adequate to our needs. BinaryFormatter has been with us since .NET Framework 1.0 so I'm not like saying anything even remotely new or unknown, but tomorrow (or the day after, or the day after...) I'll talk about a small refinement to BinaryClone().

Posted by Edgar Sánchez with 2 comment(s)
Filed under: ,

A cool way to find out whether a number is palindromic

In this blog entry I proposed a solution to Problem 4 at Project Euler, a crucial element of the problem is to find out whether a number is a palindrome (909 is, 809 isn't), a bit out of laziness and a bit in order to reuse existing methods, I decided to verify the palindrome by converting the number into a char array and then comparing this array with its mirror version, it works but it's not really that mathematical... Dustin Campbell proposed a solution kind of similar to mine (alas, more elegant and, above that, in F#) and using the same trick of converting the number to chars, as he didn't like this part of the solution, in this new blog entry he proposes the detection of a palindrome by mirroring the number one digit at a time. A translation of his F# code to C# 3.0 could be:

    1     Func<int, int, int> TailReverseNumber = null;

    2     TailReverseNumber = (n, res) => n == 0 ? res : TailReverseNumber(n / 10, 10 * res + n % 10);

    3 

    4     Func<int, bool> IsPalindrome = n => n == TailReverseNumber(n, 0);

TailReverseNumber takes a number n and "mirrors" it, one digit at a time, for example: TailReverseNumber(237, 0) -> TailReverseNumber(23, 7) -> TailReverseNumber(2, 73) -> TailReverseNumber(0, 732), the big trick is that res works as an accumulator that is multiplied by 10, therefore moving its value to the left, and putting in the "hole" that appears at the right the least significant digit of n. As it names implies, TailReverseNumber() uses tail recursion, so that no extra call stack space is used to save any intermediate results, which makes the process pretty efficient. Actually, in my PC it's four times faster than the initial solution. More efficient and more elegant, a smart guy Dustin.

New version of F# just released

In its way from research language to commercial language, Don Syme just announced that, silently, on May the 1st version 1.9.4.15 of F# was released.

 F# 1.9.4.15

This new version incorporates a number of specific enhancements (F# is now basically in stabilization mode, so we really shouldn't expect significative changes to the language). By the way, the example in the picture shows my idea for solving Problema 2 of Project  Euler: find the sum of all the even-valued terms in the Fibonacci sequence which do not exceed four million. For an explanation of how it works, I suggest you this Dustin Campbell blog, which proposes a solution similar to mine with a couple of elegant touches and, above all, with a detailed and engaging explanation.

Which is the ten thousand first prime?

Prime numbers have a good deal of practical applications (for example in cryptography) but let's face it, even if they would have none, they would still be the favorite toy of mathematicians. In Problem 7 of Project Euler, we are asked to find the 10001st element of the famous list, my approach was this:

  1. Define the *infinite* sequence of the prime numbers
  2. From this sequence, throw away the first 10000 items and then take the first of the remaining

Creating an infinite sequence in C# is easy (since version 2) thanks to IEnumerables and, above all, the yield statement:

    1 IEnumerable<int> Primes()

    2 {

    3     yield return 2;

    4 

    5     var primesSoFar = new List<int>();

    6     primesSoFar.Add(2);

    7 

    8     Func<int, bool> IsPrime = n => primesSoFar.TakeWhile(p => p <= (int)Math.Sqrt(n)).FirstOrDefault(p => n % p == 0) == 0;

    9 

   10     for (int i = 3; true; i += 2)

   11     {

   12         if (IsPrime(i))

   13         {

   14             yield return i;

   15             primesSoFar.Add(i);

   16         }

   17     }

   18 }

The yield at line 3 returns the first item of the sequence: the always excepcional 2 (the only even prime). Then at line 5 we create a list where we will be saving the generated primes as we progress in the sequence (this way we will gain speed at the cost of memory). The IsPrime(n) function defined at line 9, proposes a method -pretty crude actually- of verifying whether a number is prime: we take all primes generated so far which are lower or equal than the square root of n, and we look for the first among them that divides n evenly, if such a divisor exists then n is not a prime, that is: if none of the primes already generated is an exact divisor of n then FirstOrDefault() returns a 0, signaling the fact that n is indeed a prime. Finally at line 10 starts the loop that, every time the Prime() invoker asks for an item, it progresses thru 3, 5, 7, 9, 11, 13, ... stopping and returning, thru yield, when it finds a new prime.

With this sequence in our hands, step 2 of my plan is utterly simple:

return Primes().Skip(nth - 1).First();

We take the Primes() sequence, ignore the first nth - 1 (in our case nth = 100001) and then take the first of the remaining. This little code returns the answer in far less than a second.

More Posts Next page »