Thinking in Concurrently in .NET

Tuesday, May 13, 2008

.NET C# F# Frameworks

In recent posts, you've found that I've been harping on immutability and side effect free functions. There is a general theme emerging from this and some real reasons why I'm pointing it out. One of the things that I'm interested in is concurrent programming on the .NET platform for messaging applications. As we see more and more cores and processors available to us, we need to be cognizant of this fact as we're designing and writing our applications. Most programs we write today are pretty linear in nature, except for say forms applications which use background worker threads to not freeze the user interface. But for the most part, we're not taking full advantage of the CPU and its cycles. We need not only a better way to handle concurrency, but a better way to describe them as well. This is where Pi-calculus comes into the picture... But before we get down that beaten path, let's look at a few options that I chose. Not that these aren't all of them, just a select few I chose to analyze.

Erlang in .NET?

For many people, Erlang is considered to be one of the more interesting languages to come out of the concurrent programming field. This language has received little attention until now when we've hit that slowdown of scaling our processor speed and instead coming into multi-core/multi-processor environments. What's interesting about Erlang is that it's a functional language, much like F#, Haskell, OCaml, etc. But what makes it intriguing as well is that it's not a static typed language like the others, and instead dynamic. Erlang was designed to support distributed, fault-tolerant, non-stop real-time applications. Written by Ericsson in the 1980s, it has been the mainstay of telephone switches ever since. If you're interested in listening to more about it, check out Joe Armstrong's appearance on Software Engineering Radio Episode 89 "Joe Armstrong on Erlang". If you want to dig deeper into Erlang, check out the book "Programming Erlang: Software for a Concurrent World" also by Joe Armstrong, and available on Amazon.

How does that lead us to .NET? Well, it's interesting that someone thought of trying to port the language to .NET on a project called Erlang.NET. This project didn't get too far as I can tell, and for obvious impedance mismatch reasons. First off, there is a bit of a disconnect between .NET processes and Erlang processes and how he wants to tackle them. Erlang processes are cheap to create and tear down, whereas .NET ones tend to be a bit heavy. Also the Garbage Collection runs a bit differently instead of a per process approach, the CLR takes a generational approach. And another thing is that Erlang is a dynamic language running on its own VM, so it would probably sit on top of the DLR in the .NET space. Not saying it's an impossible task, but improbable the way he stated.

Instead, maybe the approach to take with an Erlang-like implementation is to create separate AppDomains since they are relatively cheap to create. This will allow for process isolation and messaging constructs to fit rather nicely. Instead, we get rid of the impedance mismatch by mapping an Erlang process to an AppDomain. Then you can tear down the AppDomain after you are finished or you could restart them in case of a recovery scenario. These are some of the ideas if you truly want to dig any further into the subject. I'll probably cover this in another post later.

So, where does that leave us with Erlang itself? Well, we have the option of integrating Erlang and .NET together through OTP.NET.   The original article from where the idea came from is from the ServerSide called "Integrating Java and Erlang". This allows for the use of Erlang to do the computation on the server in a manner that best fits the Erlang style. I find it's a pretty interesting article and maybe when I have a spare second, I'll check it out a bit more. But, in terms of a full port to .NET? Well, I think .NET languages have some lessons to learn from Erlang, as it tackled concurrent programming as the first topic instead of most imperative languages bolting it on after the fact.

MPI.NET

The Message Passing Interface (MPI) approach has been an interesting way of solving mass concurrency for applications. This involves using a standard protocol for passing messages from node to node through the system by the way of a compute cluster. In the Windows world, we have Windows Compute Cluster Server (WCCS) that handles this need. CCS is available now in two separate SKUs, CCS 2003 and CCS 2008 for Server 2008. The Server 2008 CCS is available in CTP on the Microsoft Connect website. See here for more information. You mainly find High Performance Computing with MPI in the automotive, financial, scientific and academic communities where they have racks upon racks of machines.

Behind the scenes, Microsoft implemented the MPICH2 version of the MPI specification. This was then made available to C programmers and is fairly low level. Unfortunately, that leaves most C++ and .NET programmers out in the cold when it comes to taking advantage. Sure, C++ could use the standard libraries, but instead, the Boost libraries were created to support MPI in a way that C++ could really take advantage of.

After this approach was taken, a similar approach was taken for the .NET platform with MPI.NET. The University of Indiana produced a .NET version which looked very similar to the Boost MPI approach but with .NET classes. This allows us to program in any .NET language now against the Windows CCS to take advantage of the massive scalability and scheduling services offered in the SKU. At the end of the day, it's just a thin wrapper over P/Invoking msmpi.dll with generics thrown in as well. Still, it's a nice implementation.

And since it was written for .NET, I can for example do a simple hello world application in F# to take advantage of the MPI. The value being is that most algorithms and heavy lifting you would be doing through there would probably be functional anyways. So, I can use F# to specify more succinctly what types of actions and what data I need. Here is a simple example:

#light

#R "D:\Program Files\MPI.NET\Lib\MPI.dll"

open MPI

let main (args:string[]) =
using(new Environment(ref args))( fun e ->
    let commRank = Communicator.world.Rank
    let commSize = Communicator.world.Size
    match commRank with
    | 0 ->
      let intValue = ref 0
      [1 .. (commSize - 1)] |> List.iter (fun i ->
        Communicator.world.Receive(Communicator.anySource, Communicator.anyTag, intValue)
        System.Console.WriteLine("Hello from node {0} out of {1}", !intValue, commSize))
    | _ -> Communicator.world.Send(commRank,0, 0)
)

main(System.Environment.GetCommandLineArgs())

I'll go into more detail in the future as to what this means and why, but just to whet your appetite about what you can do in this is pretty powerful.

F# to the Rescue with Workflows?

Another topic for discussion is for asynchronous workflows. This is another topic in which F# excels as a language. Async<'a> values are really a way of writing continuation passing explicitly. I'll be covering this more in a subsequent post shortly, but in the mean time, there is good information from Don Syme here and Robert Pickering here.

Below is a quick example of an asynchronous workflow which fetches the HTML from each of the given web sites. I can then run each in parallel and get the results rather easily. What I'll do below is a quick retrieval of HTML by calling the Async methods. Note that these methods don't exactly exist, but F# through its magic, creates that for you.

#light

open System.IO
open System.Net
open Microsoft.FSharp.Control.CommonExtensions

let fetchAsync (url:string) =
async { let request = WebRequest.Create(url)
          let! response = request.GetResponseAsync()
          let stream = response.GetResponseStream()
          let reader = new StreamReader(stream)
          let! html = reader.ReadToEndAsync()
          return html
        }

let urls = ["http://codebetter.com/"; "http://microsoft.com"]
let htmls = Async.Run(Async.Parallel [for url in urls -> fetchAsync url])
print_any htmls

So, as you can see, it's a pretty powerful mechanism for retrieving data asynchronously and then I can run each of these in parallel with parameterized data.

Parallel Extensions for .NET

Another approach I've been looking at is the Parallel Extensions for .NET. The current available version is for the December CTP and is available here. You can read more about it from two MSDN Magazine articles:

What I find interesting is Parallel LINQ or PLINQ for short. The Task Parallel library doesn't interest me as much. LINQ in general is interesting to a functional programmer in that it's a lazy loaded function. The actual execution of your LINQ task is delayed until the first yield in GetEnumerator() has been called. That's definitely taking some lessons from the functional world and pretty powerful. And add on top of that the ability to parallelize your heavy algorithms is a pretty powerful concept. I hope this definitely moves forward.

Conclusion

As you can see, I briefly gave an introduction to each of these following areas that I hope to dive into a bit more in the coming weeks and months. I've only scratched the surface on each and each tackle the concurrency problems in slightly different ways and each has its own use. But I hope I whetted your appetite to look at some of these solutions today.

The Task Parallel library is actualle quite usefull. As of now you dont hava to write all the threading code yourself. This simplifies testing av is less errorprone.

flalar - Tuesday, May 13, 2008 7:04:41 AM

Take a look at the Concurrency and Coordination Runtime (CCR) in the Microsoft Robotics Developer Studio. It was written with concurrency in mind and it looks very promising.

Martijn Plijnaer - Tuesday, May 13, 2008 3:39:52 PM

@Martijn

I have looked at that, but the CCR has an interesting license to it, and doesn't make as much sense outside of Robotics Studio. I believe most people aren't using CCR due to some limitations and are recommending other solutions.

Matthew Podwysocki - Tuesday, May 13, 2008 6:42:24 PM

@flalar

I'm sure it is useful, but I haven't had a use case to make use of it yet. Instead, I'm focusing in message passing and such that MPI, F# and others are better at.

Matthew Podwysocki - Tuesday, May 13, 2008 6:43:34 PM

I think prohibitive lcenising costs will also hold us back. Our 128 seat dce license for matlab was not cheap. starp is even more expensive.What I can't deny or ignore is more and more people are turning to matlab as their computational platform, and it is fast since it does use optimized libraries in the back end.As part of a programming assignment I had the students code up a dgemm example in C or Fortran and use either Intel's MKL or AMD's ACML for the blas library. For grins and giggles I also had them link against NAG, IMSL, and whatever other blas library they could download off the web.Programming was hard work and they burned up an entire lab session plu smore to get it done.It took 5 minutes in a matlab session, and the timing differences between C code with MKL and matlab were nil. We used the same input data for all cases.

RHeo - Saturday, May 12, 2012 2:36:15 AM

5 Comments