May 2008 - Posts

The May meeting of DC ALT.NET has been scheduled for May 22nd from 7-9PM.  Check out our mailing list and site for more information as it becomes available.  If you're in the Washington DC area, come check us out.  This month, we're having Craig Andera, of FlexWiki fame, speak about applying lessons learned from learning Lisp and how to be a better programmer because of it.  That's one of the true strength's of the DC ALT.NET, or even the ALT.NET movement as a whole, as we look outside our .NET community to the outside world to find better ways to solve problems and apply lessons learned from each community, and Lisp is one of those communities.  Dave Laribee, Jeremy Miller and Chad Myers spoke about this on the first episode of the ALT.NET Podcast with Mike Moore.  If you haven't listened to it yet, I highly recommend that you do.

Applying Lessons Learned from Lisp

There has been a lot of talk and some hype (deserved and undeserved) around functional programming lately, partly due to looking for ways for expressing parallel applications and multi-core scenarios.  Some might find it interesting that functional programming has its roots back in the 1950s, well before Object Oriented Programming, yet has been relegated mostly to the research community mostly. 

Back in 1958, John McCarthy from MIT designed Lisp and has been a mainstay in the Artificial Intelligence field for a long time after that.  Since that time, there have been quite a few Lisp dialects to pop up due to the fact that many of the universities and labs did not share their information before everyone was connected to ARPANET.  Two that have really emerged since then are Common Lisp, an attempt to standardize the Lisp variants into one, and Scheme.  Lisp is a strongly typed dynamic language, meaning that if when it is interpreted, the function does not exist, an exception will be thrown.  By it's nature, it is a functional language with such elements as lists, lambdas and so on.  Some of the interesting additions to Lisp is the Common Lisp Object System (CLOS) which adds OOP functionality to the Common Lisp language.  It's a bit different than what we think of OOP in C++, C#, Java and other OO langauges.

In the .NET world, we have IronLisp and IronScheme.  IronLisp has been deprecated in favor of IronScheme going forward.  That's the beauty of .NET is to build these dynamic languages on top of the DLR with relative ease, truly speaks to how flexible the type system and CLR are.  To make OOP and FP first class citizens within the .NET space is also pretty interesting as well.

Back to Lisp, if you want to hear more, you should check out Dick Gabriel's appearance on Software Engineering Radio Episode 84 on Common Lisp.  Dick has been a noted authority in the Lisp space for some time and is the organizer for OOPSLA back in 2007.  It's one of their better episodes, so I'd encourage you to listen to it.  I know I did, but then again, I have a pretty long commute, so I have time to listen to these things.

Who We Are

So, as I said, I run the DC ALT.NET group which meets monthly to discuss ways of bettering ourselves.  You won't find us doing what most other user groups do in the area and is more of an intimate environment for learning and discussion.  Typically we have the first hour for the topic of discussion, this month being Lisp, and then the second hour is Open Spaces, so it encourages everyone to speak and bring a topic they are passionate about.  As always, we're looking for sponsors to help us out along the way.  Since we're in the Washington DC area, and traffic can be bad, we tend to move from month to month to accommodate.  That may change in the future as we grow, but for now, it works nicely.  So, if you're in the DC area, come check us out.  And, hopefully I'll get Dave Laribee to stop by before too long as well...

Where I'll Be

In addition to the meeting next week, I will be speaking at the Philly ALT.NET group meeting on May 21st on F# and an introduction to Functional programming.  This should be a great session and I hope there will be a good crowd for it.  Also, this weekend is the NoVA Code Camp in which I have two sessions, "Introduction to F# and Functional Programming" and "Improve Your C# with Functional Programming Ideas".  Look forward to seeing everyone at those events!

kick it on DotNetKicks.com
In my previous post, I looked at some of the options we have for concurrency programming in .NET applications.  One of the interesting ones, yet specialized is the Message Passing Interface (MPI).  Microsoft made the initiative to get into the high performance computing space with the Windows Server 2003 Compute Cluster Server SKU.  This allowed developers to run their given algorithms using MPI on a massive parallelized scale.  And now with the Windows Server 2008 HPC SKU, it is a bit improved with WCF support for scheduling and such.  If you're not part of the beta and are interested, I'd urge you to go through Microsoft Connect. 

When Is It Appropriate?

When I'm talking about MPI, I'm talking in the context of High Performance Computing.  This consists of having the application run within a scheduler on a compute cluster which can have 10s or hundreds of nodes.  Note that I'm not talking about grid computing such as Folding@Home which distributes work over the internet.  Instead, you'll find plenty of need for this in the financial sector, insurance sector for fraud detection and data analysis, manufacturing sector for testing and calculating limits, thresholds and whatnot, and even in compiling computer animation in film.  There are plenty of other scenarios that are out there, but it's not for your everyday business application.

I think the real value comes with .NET to be able to read from databases, communicate with other servers with WCF or some other communication protocol, instead of being stuck in the C or Fortran world which the HPC market has been relegated.  Instead, they can cut down on the code necessary for a lot of these applications by using the built-in functions that we get with the BCL.

MPI in .NET

The problem has been to run these massively parallel algorithms left us limited to Fortran and C systems.  This was ok for most things that you would want to do, cobbling together class libraries wasn't my ideal.  Instead, we could use a lot of the things that we take for granted in .NET such as strong types, object oriented and functional programming constructs.

The Boost libraries were made available for MPI in C++ very recently by the University of Indiana.  You can read more about it here.  This allowed the MPI programmer to take advantage of many of the C++ constructs that you can do in regular C, such as OOP.  Instead of dealing with functions and structs, there is a full object model for dealing with messaging.

At the same time as the Boost C++ Libraries for MPI were coming out, the .NET implementation has been made available based upon the C++ design through MPI.NET.  It's basically a thin veneer over the msmpi.dll which is the Microsoft implementation of the MPICH2 standard.  For a list of all operation types supported, check the API documentation here for the raw MSMPI implementation.  This will give you a better sense of the capabilities more than the .NET implementation can.

What you can think of this is that several nodes will be running an instance of your program at once.  So, if you have 16 nodes assigned through your scheduled job, it will spin up 16 instances of the same application.  When you do this on a test machine, you'll notice 16 instances of that in your task manager.  Kind of cool actually.  Unfortunately, they are missing a lot of the neat features in MPI which includes "Ready Sends", "Buffered Sends", but they have included nice things such as the Graph and Cartesian communicators which are essential in MPI.

You'll need the Windows Server 2003/2008 HPC SDK in order to run these examples, so download them now, and then install MPI.NET to follow along.

Messaging Patterns

With this, we have a few messaging patterns available to us.  MPI.NET has given us a few that we will be looking at and how best to use them.  I'll include samples in F# as it's pretty easy to do and I'm trying to get through on the fact that F# is a better language for expressing the messaging we're doing instead of C#.  But, for these simple examples, they are not hard to switch back and forth.

To execute these, just type the following:

mpiexec - n <Number of Nodes You Want> <Your program exe>

Broadcast

A broadcast is a a process in which a single process (ala a head node) sends the same data to all nodes in the cluster.  We want to be efficient as possible when sending out this data for all to use, without having to loop through all sends and receives.  This is good when a particular root node has a value that the rest of the cluster needs before continuing.  Below is a quick example in which the head node sets the value to 42 and the rest will receive it.

#light

#R "D:\Program Files\MPI.NET\Lib\MPI.dll"

open System
open MPI

let main(args:string[]) =
  using(new Environment(ref args))(fun _->
    let commRank = Communicator.world.Rank

    let intValue = ref 0
    if commRank = 0 then
      intValue := 42
     
    Communicator.world.Broadcast(intValue, 0)
    Console.WriteLine("Broadcasted {0} to all nodes", !intValue)
  )
main(Environment.GetCommandLineArgs())

Blocking Send and Receive

In this scenario, we're going to use the blocking send and receive pattern.  This will not allow the program to continue until I get the message I'm looking for.  This is good for times when you need a particular value before proceeding to your next function from the head node or any other particular node.

#light

#R "D:\Program Files\MPI.NET\Lib\MPI.dll"

open System
open MPI

let main (args:string[]) =
  using(new Environment(ref args))( fun _ ->
    let commRank = Communicator.world.Rank
    let commSize = Communicator.world.Size
    let intValue = ref 0
    match commRank with
    | 0 ->
      [1 .. (commSize - 1)] |> List.iter (fun i ->
        Communicator.world.Receive(Communicator.anySource, Communicator.anyTag, intValue)
        Console.WriteLine("Result: {0}", !intValue))
    | _ ->
      intValue := 4 * commRank
      Communicator.world.Send(!intValue,0, 0)
  )

What I'm doing here is letting the head node, rank 0, to do all the receiving work.  Note, that I don't care particularly where the source was, nor what the tag was.  I can specify however, if I wish to go ahead and receive from a certain node and of a certain data tag.  If it's a slave process, then I'm going to go ahead and calculate the value, and send it back to the head node of 0.  The head node will wait until it has received that value from any node and then print out the given value.  The methods that I'm using the send and receive are generic methods.  Behind the scenes, in order to send, the system will go ahead and serialize your object into an unmanaged memory stream and throw it on the wire.  This is one of the fun issues when dealing with marshaling to unmanaged C code.

Nonblocking Send and Receive

In this scenario, we are not going to block as we did before with sending or receiving.  We want the ability to continue on doing other things while I sent the value, while the other receivers might need that value before continuing.  Eventually we can force getting that value from the node through the communication status, and then at a certain point, we can set up a barrier so that nobody can continue until we've hit that point in our program.  The below sample is a quick sending of a multiplied value and letting it continue.  The other nodes will have to wait until that broadcast comes, and then we'll wait at the barrier until the job is done.

let main (args:string[]) =
  using(new Environment(ref args))( fun _ ->
    let commRank = Communicator.world.Rank
    let commSize = Communicator.world.Size
   
    let intValue = ref 0
    if commRank = 0 then
      [1 .. (commSize - 1)] |> List.iter (fun _ ->
        Communicator.world.Receive(Communicator.anySource, Communicator.anyTag, intValue)
        Console.WriteLine("Result: {0}", !intValue))
    else
      intValue := 4 * commRank
      let status = Communicator.world.ImmediateSend(!intValue,0, 0)
      status.Wait() |> ignore
     
    Communicator.world.Barrier()
  )
 
main(Environment.GetCommandLineArgs())

Gather and Scatter

The gather process takes values from each process and then sends it to the root process as an array for evaluation.  This is a pretty simple operation for taking all values from all nodes and combining them on the head node.  What I'm doing is a simple calculation of gathering all values of commRank * 3 and sending it to the head node for evaluation.

let main (args:string[]) =
  using(new Environment(ref args))( fun e ->
    let commRank = Communicator.world.Rank
    let intValue = commRank * 3
   
    match commRank with
    | 0 ->
      let ranks = Communicator.world.Gather(intValue, commRank)
      ranks |> Array.iter(fun i -> System.Console.WriteLine(" {0}", i))
    | _ -> Communicator.world.Gather(intValue, 0) |> ignore
  )
 
main(Environment.GetCommandLineArgs())

Conversely, scatter does the opposite which takes a row from the given head process and splits it apart to be spread out among all processes.  In this exercise I will go ahead and create a mutable array that only the head node will modify.  From there, I will scatter it across the rest of the nodes to pick up and do with whatever they please.

let main (args:string[]) =
  using(new Environment(ref args))( fun e ->
    let commSize = Communicator.world.Size
    let commRank = Communicator.world.Rank
    let mutable table = Array.create commSize 0
   
    match commRank with
    | 0 ->
      table <- Array.init commSize (fun i -> i * 3)
      Communicator.world.Scatter(table, 0) |> ignore
    | _ ->
      let scatterValue = Communicator.world.Scatter(table, 0)
      Console.WriteLine("Scattered {0}", scatterValue)
  )
 
main(System.Environment.GetCommandLineArgs())

There is an AllGather method as well which performs a similar operation to Gather, but the results are available to all processes instead of the root process. 

Reduce

Another collective algorithm similar to scatter and gather is the reduce function.  This allows us to combine all values from each process and perform an operation on them, whether it be to add, multiply, find the maximum, minimum and so on.  The value is only available at the root process though, so I have to ignore the result for the rest of the processes.  The following example shows a simple

let main (args:string[]) =
  using(new Environment(ref args))( fun _ ->
    let commRank = Communicator.world.Rank
    let commSize = Communicator.world.Size
   
    match commRank with
    | 0 ->
      let sum = Communicator.world.Reduce(Communicator.world.Rank, Operation<int>.Add, 0)
      Console.WriteLine("Sum of all roots is {0}", sum)
    | _ ->
      Communicator.world.Reduce(Communicator.world.Rank, Operation<int>.Add, 0) |> ignore
  )
 
main(Environment.GetCommandLineArgs())

There is another variation called the AllReduce which does very similar operations to the Reduce function, but instead makes the value available to all processes instead of just the root one.  There are more operations and more communicators such as Graph and Cartesian, but this is enough to give you an idea of what you can do here. 

LINQ for MPI.NET

During my search for MPI.NET solutions, I came across a rather interesting one called LINQ for MP.NET.  I don't know too many of the details figuring the author has been pretty aloof as to providing the complete design details.  But it has entered a private beta if you do wish to contact them for more information.

The basic idea is to provide provide some scope models which include for the current scope, the world scope, root and so on.  Also, it looks like they are providing some sort of multi-threading capabilities as well.  Looks interesting and I'm interested in finding out more.

Pure MPI.NET?

Another implementation of the MPI in .NET has surfaced through PureMPI.NET.   This is an implementation of the MPICH2 specification as well, but built on WCF instead of the MSMPI.dll.  Instead, this does not rely on the Microsoft Compute Cluster service for scheduling and instead, uses remoting and such for communication purposes.  There is a CodeProject article which explains it a bit more here.

More Resources

So, you want to know more, huh?  Well, most of the interesting information is out there in C, so if you can read and translate it to the other APIs, you should be fine.  However, there are some good books on the subject which not only provide some decent samples, but also some guidance on how to make the most of the MPI implementation.  Below are some of the basic ones which will help on learning not only the APIs, but the patterns behind their usage.


Wrapping It Up

I hope you found some of this useful for learning about how the MPI can help for massive parallel applications.  The patterns learned here as well as the technologies behind them are pretty powerful to help you think about how to make your programs a bit less linear in nature.  There is more to this series to look at thinking of concurrency in .NET, so I hope you stay tuned.

kick it on DotNetKicks.com
In recent posts, you've found that I've been harping on immutability and side effect free functions.  There is a general theme emerging from this and some real reasons why I'm pointing it out.  One of the things that I'm interested in is concurrent programming on the .NET platform for messaging applications.  As we see more and more cores and processors available to us, we need to be cognizant of this fact as we're designing and writing our applications.  Most programs we write today are pretty linear in nature, except for say forms applications which use background worker threads to not freeze the user interface. But for the most part, we're not taking full advantage of the CPU and its cycles.  We need not only a better way to handle concurrency, but a better way to describe them as well.  This is where Pi-calculus comes into the picture...  But before we get down that beaten path, let's look at a few options that I chose.  Not that these aren't all of them, just a select few I chose to analyze.

Erlang in .NET?

For many people, Erlang is considered to be one of the more interesting languages to come out of the concurrent programming field.  This language has received little attention until now when we've hit that slowdown of scaling our processor speed and instead coming into multi-core/multi-processor environments.  What's interesting about Erlang is that it's a functional language, much like F#, Haskell, OCaml, etc.  But what makes it intriguing as well is that it's not a static typed language like the others, and instead dynamic.  Erlang was designed to support distributed, fault-tolerant, non-stop real-time applications.  Written by Ericsson in the 1980s, it has been the mainstay of telephone switches ever since.  If you're interested in listening to more about it, check out Joe Armstrong's appearance on Software Engineering Radio Episode 89 "Joe Armstrong on Erlang".  If you want to dig deeper into Erlang, check out the book "Programming Erlang: Software for a Concurrent World" also by Joe Armstrong, and available on Amazon.

How does that lead us to .NET?  Well, it's interesting that someone thought of trying to port the language to .NET on a project called Erlang.NET.  This project didn't get too far as I can tell, and for obvious impedance mismatch reasons.  First off, there is a bit of a disconnect between .NET processes and Erlang processes and how he wants to tackle them.  Erlang processes are cheap to create and tear down, whereas .NET ones tend to be a bit heavy.  Also the Garbage Collection runs a bit differently instead of a per process approach, the CLR takes a generational approach.  And another thing is that Erlang is a dynamic language running on its own VM, so it would probably sit on top of the DLR in the .NET space.  Not saying it's an impossible task, but improbable the way he stated.

Instead, maybe the approach to take with an Erlang-like implementation is to create separate AppDomains since they are relatively cheap to create.  This will allow for process isolation and messaging constructs to fit rather nicely.  Instead, we get rid of the impedance mismatch by mapping an Erlang process to an AppDomain.  Then you can tear down the AppDomain after you are finished or you could restart them in case of a recovery scenario.  These are some of the ideas if you truly want to dig any further into the subject.  I'll probably cover this in another post later.

So, where does that leave us with Erlang itself?  Well, we have the option of integrating Erlang and .NET together through OTP.NET.   The original article from where the idea came from is from the ServerSide called "Integrating Java and Erlang".  This allows for the use of Erlang to do the computation on the server in a manner that best fits the Erlang style.  I find it's a pretty interesting article and maybe when I have a spare second, I'll check it out a bit more.  But, in terms of a full port to .NET?  Well, I think .NET languages have some lessons to learn from Erlang, as it tackled concurrent programming as the first topic instead of most imperative languages bolting it on after the fact.

MPI.NET

The Message Passing Interface (MPI) approach has been an interesting way of solving mass concurrency for applications. This involves using a standard protocol for passing messages from node to node through the system by the way of a compute cluster.  In the Windows world, we have Windows Compute Cluster Server (WCCS) that handles this need.  CCS is available now in two separate SKUs, CCS 2003 and CCS 2008 for Server 2008.  The Server 2008 CCS is available in CTP on the Microsoft Connect website.  See here for more information.  You mainly find High Performance Computing with MPI in the automotive, financial, scientific and academic communities where they have racks upon racks of machines.

Behind the scenes, Microsoft implemented the MPICH2 version of the MPI specification.  This was then made available to C programmers and is fairly low level.  Unfortunately, that leaves most C++ and .NET programmers out in the cold when it comes to taking advantage.  Sure, C++ could use the standard libraries, but instead, the Boost libraries were created to support MPI in a way that C++ could really take advantage of. 

After this approach was taken, a similar approach was taken for the .NET platform with MPI.NET.  The University of Indiana produced a .NET version which looked very similar to the Boost MPI approach but with .NET classes.  This allows us to program in any .NET language now against the Windows CCS to take advantage of the massive scalability and scheduling services offered in the SKU.  At the end of the day, it's just a thin wrapper over P/Invoking msmpi.dll with generics thrown in as well.  Still, it's a nice implementation.

And since it was written for .NET, I can for example do a simple hello world application in F# to take advantage of the MPI.  The value being is that most algorithms and heavy lifting you would be doing through there would probably be functional anyways.  So, I can use F# to specify more succinctly what types of actions and what data I need.  Here is a simple example:

#light

#R "D:\Program Files\MPI.NET\Lib\MPI.dll"

open MPI

let main (args:string[]) =
  using(new Environment(ref args))( fun e ->
    let commRank = Communicator.world.Rank
    let commSize = Communicator.world.Size
    match commRank with
    | 0 ->
      let intValue = ref 0
      [1 .. (commSize - 1)] |> List.iter (fun i ->
        Communicator.world.Receive(Communicator.anySource, Communicator.anyTag, intValue)
        System.Console.WriteLine("Hello from node {0} out of {1}", !intValue, commSize))
    | _ -> Communicator.world.Send(commRank,0, 0)
  )

main(System.Environment.GetCommandLineArgs())

I'll go into more detail in the future as to what this means and why, but just to whet your appetite about what you can do in this is pretty powerful.

F# to the Rescue with Workflows?

Another topic for discussion is for asynchronous workflows.  This is another topic in which F# excels as a language.  Async<'a> values are really a way of writing continuation passing explicitly.  I'll be covering this more in a subsequent post shortly, but in the mean time, there is good information from Don Syme here and Robert Pickering here.

Below is a quick example of an asynchronous workflow which fetches the HTML from each of the given web sites.  I can then run each in parallel and get the results rather easily.  What I'll do below is a quick retrieval of HTML by calling the Async methods.  Note that these methods don't exactly exist, but F# through its magic, creates that for you.

#light

open System.IO
open System.Net
open Microsoft.FSharp.Control.CommonExtensions

let fetchAsync (url:string) =
  async { let request = WebRequest.Create(url)
          let! response = request.GetResponseAsync()
          let stream = response.GetResponseStream()
          let reader = new StreamReader(stream)
          let! html = reader.ReadToEndAsync()
          return html
        }

let urls = ["http://codebetter.com/"; "http://microsoft.com"]
let htmls = Async.Run(Async.Parallel [for url in urls -> fetchAsync url])
print_any htmls

So, as you can see, it's a pretty powerful mechanism for retrieving data asynchronously and then I can run each of these in parallel with parameterized data.

Parallel Extensions for .NET

Another approach I've been looking at is the Parallel Extensions for .NET.  The current available version is for the December CTP and is available here.  You can read more about it from two MSDN Magazine articles:

What I find interesting is Parallel LINQ or PLINQ for short.  The Task Parallel library doesn't interest me as much.  LINQ in general is interesting to a functional programmer in that it's a lazy loaded function.  The actual execution of your LINQ task is delayed until the first yield in GetEnumerator() has been called.  That's definitely taking some lessons from the functional world and pretty powerful.  And add on top of that the ability to parallelize your heavy algorithms is a pretty powerful concept.  I hope this definitely moves forward.

Conclusion

As you can see, I briefly gave an introduction to each of these following areas that I hope to dive into a bit more in the coming weeks and months.  I've only scratched the surface on each and each tackle the concurrency problems in slightly different ways and each has its own use.  But I hope I whetted your appetite to look at some of these solutions today.

kick it on DotNetKicks.com
I decided to stay on the Design by Contract side for just a little bit.  Recently, Raymond Chen posted "If you pass invalid parameters, then all bets are off" in which he goes into parameter validation and basic defensive programming.  Many of the conversations had on the blog take me back to my C++ and early Java days of checking for null pointers, buffer lengths, etc.  This brings me back to some recent conversations I've had about how to make it explicit about what I expect.  Typical defensive behavior looks something like this:

public static void Foreach<T>(this IEnumerable<T> items, Action<T> action)
{
    if (action == null)
        throw new ArgumentNullException("action");

    foreach (var item in items)
        action(item);
}

After all, how many times have you not had any idea what the preconditions are for a given method due to lack of documentation or non-intuitive method naming?  it gets worse when they don't provide much documentation, XML comments or otherwise.  At that point, it's time to break out .NET Reflector and dig deep.  Believe me, I've done it quite a bit lately.

The Erlang Way

The Erlang crowd takes an interesting approach to the issue that I've really been intrigued by.  Joe Armstrong calls this approach "Let it crash" in which you only code to the sunny day scenario, and if the call to it does not conform to the spec, just let it crash.  You can read more about that on the Erlang mailing list here.

Some paragraphs stuck out in my mind.

Check inputs where they are "untrusted"
    - at a human interface
    - a foreign language program

What this basically states is the only time you should do such checks is at the bounds when you have possible untrusted input, such as bounds overflows, unexpected nulls and such.  He goes on to say about letting it crash:

specifications always  say what to  do if everything works  - but never what  to do if the  input conditions are not met - the usual answer is something sensible - but what you're the programmer - In C etc. you  have to write *something* if you detect an error -  in Erlang it's  easy - don't  even bother to write  code that checks for errors - "just let it crash".

So, what Joe advocates is not checking at all, and if they don't conform to the spec, just let it crash, no need for null checks, etc.  But, how would you recover from such a thing?  Joe goes on to say:

Then  write a  *independent* process  that observes  the  crashes (a linked process) -  the independent process should try  to correct the error, if it can't correct  the error it should crash (same principle) - each monitor  should try a  simpler error recovery strategy  - until finally the  error is  fixed (this is  the principle behind  the error recovery tree behaviour).

It's an interesting approach, but proves to a valuable one for parallel processing systems.  As I dig further into more functional programming languages, I'm finding such constructs useful.

Design by Contract Again and DDD

Defensive programming is a key part of Design by Contract.  But, in a way it differs.  With defensive programming, the callee is responsible for determining whether the parameters are valid and if not, throws an exception or otherwise handles it.   DbC with the help of the language helps the caller better understand how to cope with the exception if it can.

Bertrand Meyer wrote a bit about this in the Eiffel documentation here.  But, let's go back to basics. DbC asserts that the contracts (what we expect, what we guarantee, what we maintain) are such a crucial piece of the software, that it's part of the design process.  What that means is that we should write these contract assertions FIRST. 

What do these contract assertions contain?  It normally contains the following:
  • Acceptable/Unacceptable input values and the related meaning
  • Return values and their meaning
  • Exception conditions and why
  • Preconditions (may be weakened by subclasses)
  • Postconditions (may be strengthened by subclasses)
  • Invariants (may be strengthened by subclasses)

So, in effect, I'm still doing TDD/BDD, but an important part of this is identifying my preconditions, postconditions and invariants.  These ideas mesh pretty well with my understanding of BDD and we should be testing those behaviors in our specs.  Some people saw in my previous posts that they were afraid I was de-emphasizing TDD/BDD and that couldn't be further from the truth.  I'm just using another tool in the toolkit to express my intent for my classes, methods, etc.  I'll explain further in a bit down below.

Also, my heavy use of Domain Driven Design patterns help as well.  I mentioned those previously when I talked about Side Effects being Code Smells.  With the combination of intention revealing interfaces which express to the caller what I am intending to do, and my use of assertions not only in the code but also in the documentation as well.  This usually includes using the <exception> XML tag in my code comments.  Something like this is usually pretty effective:

/// <exception cref="T:System.ArgumentNullException"><paramref name="action"/> is null.</exception>

If you haven't read Eric's book, I suggest you take my advice and Peter's advice and do so.

Making It Explicit

Once again, the use of Spec# to enforce these as part of the method signature to me makes sense.  To be able to put the burden back on the client to conform to the contract or else they cannot continue.  And to have static checking to enforce that is pretty powerful as well. 

But, what are we testing here?  Remember that DbC and Spec# can ensure your preconditions, your postconditions and your invariants hold, but they cannot determine whether your code is correct and conforms to the specs.  That's why I think that BDD plays a pretty good role with my use of Spec#. 

DbC and Spec# can also play a role in enforcing things that are harder with BDD, such as enforcing invariants.  BDD does great things by emphasizing behaviors which I'm really on board with.  But, what I mean by being harder is that your invariants may be only private member variables which you are not going to expose to the outside world.  If you are not going to expose them, it makes it harder for your specs to control such behavior.  DbC and Spec# can fill that role.  Let's look at the example of an ArrayList written in Spec#.

public class ArrayList
{
    invariant 0 <= _size && _size <= _items.Length;
    invariant forall { int i in (_size : _items.Length); _items[i] == null };  // all unused slots are null

    [NotDelayed]
    public ArrayList (int capacity)
      requires 0 <= capacity otherwise ArgumentOutOfRangeException;
      ensures _size/*Count*/ == 0;
      ensures _items.Length/*Capacity*/ == capacity;
    {
      _items = new object[capacity];
      base();
    }

    public virtual void Clear ()
      ensures Count == 0;
    {
      expose (this) {
        Array.Clear(_items, 0, _size); // Don't need to doc this but we clear the elements so that the gc can reclaim the references.
        assume forall{int i in (0: _size); _items[i] == null};  // postcondition of Array.Clear
        _size = 0;
      }
    }

// Rest of code omitted

What I've been able to do is set the inner array to the new capacity, but also ensure that when I do that, my count doesn't go up, but only my capacity.  When I call the Clear method, I need to make sure the inner array is peer consistent by the way of all slots not in the array must be null as well as resetting the size.  We use the expose block to expose to the runtime to have the verifier analyze the code.  By the end of the expose block, we should be peer consistent, else we have issues.  How would we test some of these scenarios in BDD?  Since they are not exposed to the outside world, it's pretty difficult.  What it would be doing is leaving me with black box artifacts that are harder to prove.  Instead, if I were to expose them, it would then break encapsulation which is not necessarily something I want to do.  Instead, Spec# gives me the opportunity to enforce this through the DbC constructs afforded in the language. 

The Dangers of Checked Exceptions

But with this, comes a cost of course.  I recently spoke with a colleague about Spec# and the instant thoughts of checked exceptions in Java came to mind.  Earlier in my career, I was a Java guy who had to deal with those who put large try/catch blocks around methods with checked exceptions and were guilty of just catching and swallowing or catching and rethrowing RuntimeExceptions.  Worse yet, I saw this as a way of breaking encapsulation by throwing exceptions that I didn't think the outside world needed to know about.  I was kind of glad that this feature wasn't brought to C# due to the fact I saw rampant abuse for little benefit.  What people forgot about during the early days of Java that exceptions are meant to be exceptional and not control flow.

How I see Spec# being different is that since we have a static verification tool through the use of Boogie to verify whether those exceptional conditions are valid.  The green squigglies give warnings about possible null values or arguments in ranges, etc.  This gives me further insight into what I can control and what I cannot.  Resharper also has some of those nice features as well, but I've found Boogie to be a bit more helpful with more advanced static verification.

Conclusion

Explicit DbC constructs give us a pretty powerful tool in terms of expressing our domain and our behaviors of our components.  Unfortunately, in C# there are no real valuable implementations that enforce DbC constructs to both the caller and the callee.  And hence Spec# is an important project to come out of Microsoft Research.

Scott Hanselman just posted his interview with the Spec# team on his blog, so if you haven't heard it yet, go ahead and download it now.  It's a great show and it's important that if you find Spec# to be useful, that you press Microsoft to give it to us as a full feature.

kick it on DotNetKicks.com
In one of my previous posts about Command-Query Separation (CQS) and side effecting functions being code smells, it was pointed out to me again about immutable builders.  For the most part, this has been one area of CQS that I've been willing to let break.  I've been following Martin Fowler's advice on method chaining and it has worked quite well.  But, revisiting an item like this never hurts.  Immutability is something you'll see me harping on time and time again now and in the future.  The standard rules I usually do is immutable and side effect free when you can, mutable state where you must.  I like the opt-in mutability of functional languages such as F# which I'll cover at some point in the near future instead of the opt-out mutability of imperative/OO languages such as C#.

Typical Builders

The idea of the standard builder is pretty prevalent in most applications we see today with fluent interfaces.  Take for example most Inversion of Control (IoC) containers when registering types and so on:

UnityContainer container = new UnityContainer();
container
    .RegisterType<ILogger, DebugLogger>("logger.Debug")
    .RegisterType<ICustomerRepository, CustomerRepository>();

Let's take a naive medical claims processing system and building up and aggregate root of a claim.  This claim contains such things as the claim information, the lines, the provider, recipient and so on.  This is a brief sample and not meant to be the real thing, but just a quick example.  After all, I'm missing things such as eligibility and so on.

    public class Claim
    {
        public string ClaimId { get; set; }  
        public DateTime ClaimDate { get; set; }
        public List<ClaimLine> ClaimLines { get; set; }
        public Recipient ClaimRecipient { get; set; }
        public Provider ClaimProvider { get; set; }
    }

    public class ClaimLine
    {
        public int ClaimLineId { get; set; }
        public string ClaimCode { get; set; }
        public double Quantity { get; set; }
    }

    public class Recipient
    {
        public string RecipientId { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
    }

    public class Provider
    {
        public string ProviderId { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
    }

Now our standard builders use method chaining as shown below.  As you note, we'll return the instance each and every time. 

public class ClaimBuilder
{
    private string claimId;
    private DateTime claimDate;
    private readonly List<ClaimLine> claimLines = new List<ClaimLine>();
    private Provider claimProvider;
    private Recipient claimRecipient;

    public ClaimBuilder() {}

    public ClaimBuilder WithClaimId(string claimId)
    {
        this.claimId = claimId;
        return this;
    }

    public ClaimBuilder WithClaimDate(DateTime claimDate)
    {
        this.claimDate = claimDate;
        return new ClaimBuilder(this);
    }

    public ClaimBuilder WithClaimLine(ClaimLine claimLine)
    {
        claimLines.Add(claimLine);
        return this;
    }

    public ClaimBuilder WithProvider(Provider claimProvider)
    {
        this.claimProvider = claimProvider;
        return this;
    }

    public ClaimBuilder WithRecipient(Recipient claimRecipient)
    {
        this.claimRecipient = claimRecipient;
        return this;
    }

    public Claim Build()
    {
        return new Claim
       {
           ClaimId = claimId,
           ClaimDate = claimDate,
           ClaimLines = claimLines,
           ClaimProvider = claimProvider,
           ClaimRecipient = claimRecipient
       };
    }

    public static implicit operator Claim(ClaimBuilder builder)
    {
        return new Claim
        {
            ClaimId = builder.claimId,
            ClaimDate = builder.claimDate,
            ClaimLines = builder.claimLines,
            ClaimProvider = builder.claimProvider,
            ClaimRecipient = builder.claimRecipient
        };
    }
}

What we have above is a violation of the CQS because we're mutating the current instance as well as returning a value.  Remember, that CQS states:
  • Commands - Methods that perform an action or change the state of the system should not return a value.
  • Queries - Return a result and do not change the state of the system (aka side effect free)
But, we're violating that because we're returning a value as well as mutating the state.  For the most part, that hasn't been a problem.  But what about sharing said builders?  The last thing we'd want to do is have our shared builders mutated by others when we're trying to build up our aggregate roots.

Immutable Builders or ObjectMother or Cloning?

When we're looking to reuse our builders, the last thing we'd want to do is allow mutation of the state.  So, if I'm working on the same provider and somehow change his eligibility, then that would be reflected against all using the same built up instance.  That would be bad.  We have a couple options here really.  One would be to follow an ObjectMother approach to build up shared ones and request a new one each time, or the other would be to enforce that we're not returning this each and every time we add something to our builder.  Or perhaps we can take one at a given state and just clone it.  Let's look at each.

public static class RecipientObjectMother
{
    public static RecipientBuilder RecipientWithLimitedEligibility()
    {
        RecipientBuilder builder = new ProviderBuilder()
            .WithRecipientId("xx-xxxx-xxx")
            .WithFirstName("Robert")
            .WithLastName("Smith")
            // More built in stuff here for setting up eligibility
 
        return builder;
    }
}

This allows me to share my state through pre-built builders and then when I've finalized them, I'll just call the Build method or assign them to the appropriate type.  Or, I could just make them immutable instead and not have to worry about such things.  Let's modify the above example to take a look at that.

public class ClaimBuilder
{
    private string claimId;
    private DateTime claimDate;
    private readonly List<ClaimLine> claimLines = new List<ClaimLine>();
    private Provider claimProvider;
    private Recipient claimRecipient;

    public ClaimBuilder() {}

    public ClaimBuilder(ClaimBuilder builder)
    {
        claimId = builder.claimId;
        claimDate = builder.claimDate;
        claimLines.AddRange(builder.claimLines);
        claimProvider = builder.claimProvider;
        claimRecipient = builder.claimRecipient;
    }

    public ClaimBuilder WithClaimId(string claimId)
    {
        ClaimBuilder builder = new ClaimBuilder(this) {claimId = claimId};
        return builder;
    }

    public ClaimBuilder WithClaimDate(DateTime claimDate)
    {
        ClaimBuilder builder = new ClaimBuilder(this) { claimDate = claimDate };
        return builder;
    }

    public ClaimBuilder WithClaimLine(ClaimLine claimLine)
    {
        ClaimBuilder builder = new ClaimBuilder(this);
        builder.claimLines.Add(claimLine);
        return builder;
    }

    public ClaimBuilder WithProvider(Provider claimProvider)
    {
        ClaimBuilder builder = new ClaimBuilder(this) { claimProvider = claimProvider };
        return builder;
    }

    public ClaimBuilder WithRecipient(Recipient claimRecipient)
    {
        ClaimBuilder builder = new ClaimBuilder(this) { claimRecipient = claimRecipient };
        return builder;
    }

    // More code here for building
}

So, what we've had to do is provide a copy-constructor to initialize the object in the right state.  And here I thought I could leave those behind since my C++ days.  After each assignment, I then create a new ClaimBuilder and pass in the current instance to initialize the new one, thus copying over the old state.  This then makes my class suitable for sharing.  Side effect free programming is the way to do it if you can.  Of course, realizing that it creates a few objects on the stack as you're initializing your aggregate root, but for testing purposes, I haven't really much cared. 

Of course I could throw Spec# into the picture once again as enforcing immutability on said builders.  To be able to mark methods as being Pure makes it apparent to both the caller and the callee what the intent of the method is.  Another would be using NDepend as Patrick Smacchia talked about here.

The other way is just to provide a clone method which would just copy the current object so that you can go ahead and feel free to modify a new copy.  This is a pretty easy approach as well.

public ClaimBuilder(ClaimBuilder builder)
{
    claimId = builder.claimId;
    claimDate = builder.claimDate;
    claimLines.AddRange(builder.claimLines);
    claimProvider = builder.claimProvider;
    claimRecipient = builder.claimRecipient;
}

public ClaimBuilder Clone()
{
    return new ClaimBuilder(this);
}

Conclusion

Obeying the CQS is always an admirable thing to do especially when managing side effects.  Not all of the time is it required such as with builders, but if you plan on sharing these builders, it might be a good idea to really think hard about the side effects you are creating.  As we move more towards multi-threaded, multi-machine processing, we need to be aware of our side effecting a bit more.  But, at the end of the day, I'm not entirely convinced that this violates the true intent of CQS since we're not really querying, so I'm not sure how much this is buying me.  What are your thoughts?

kick it on DotNetKicks.com
Taking a break from the Design by Contract stuff for just a bit while I step back into the F# and functional programming world.  If you followed me at my old blog, you'll know I'm pretty passionate about functional programming and looking for new ways to solve problems and express data.

Where We Are

Before we begin today, let's catch up to where we are today:
Today's topic will be covering more imperative code dealing with control flow.  But first, the requisite side material before I begin today's topic.

A Survey of .NET Languages And Paradigms

Joel Pobar just contributed an article to the latest MSDN Magazine (May 2008) called "Alphabet Soup: A Survey of .NET Languages And Paradigms". This article introduces not only the different languages that are supported in the .NET space, but the actual paradigms that they operate in.  For example, you have C#, VB.NET, C++, F# and others in the static languages space and IronRuby, IronPython among others in the dynamic space.  But what's more interesting is the way that each one tackles a particular problem.  The article covers a little bit about functional programming and its uses as well as dynamic languages.  Of course the mention is made that C# and VB.NET are slowly adopting more functional programming aspects over time.  One thing I've lamented is the fact that VB.NET and C# are too similar for my tastes so I'm hoping for more true differentiation come the next spin.  Instead, VB would be really interesting as a more dynamic language and not just one that many people just look down their noses at.  Ok, enough of the sidetracking and let's get back to the subject at hand.

Control Flow

Since F# is a general purpose language in the .NET space, it supports all imperative ways of approaching problems.  This of course includes control flow.  F# takes a different approach than most functional programming languages in that the evaluation of a statement can happen in any order.  Instead, in F#, we have a very succinct way of doing it in F# with the if, elif, else statements.  Below is a quick example of that:

#light

let IsInBounds (x:int) (y:int) =
  if x < 0 then false
  elif x > 50 then false
  elif y < 0 then false
  elif y > 50 then false
  else true

What I was able to do is to check the bounds of the given integer inputs.  Pretty simple example.  As opposed to many imperative languages, when you are returning a value from the if, all subsequent elif or elses must also return values.  This makes for balanced equations.  Also, if you return a value from an if, then you are also forced to have an else which returns a value.

Although F# is using type inference to determine what my IsInBounds method returns, I cannot go ahead and return one type in an if and another different type in the elif or else.  F# will complain violently, as it should because that's really not a good design of a function.  Below is some code that will definitely throw an error.

#light

let IsInBounds (x:int) (y:int) =
  if x < 0 then "Foo"
  elif x > 50 then false
  elif y < 0 then false
  elif y > 50 then false
  else true

As I said before, the equations must be balanced.  But of course if your if expression returns a unit (void type for those imperative folks), then you aren't forced to have and else statement.  Pretty self explanatory there. 

Let's move onto the for loops.  The standard for loop is to start at a particular index value, check for the terminate condition and then increment or decrement the index.  F# supports this of course in a pretty standard way, but by default, the index is incremented by 1.  You must note though that the body of the for loop is a unit type (void once again) so, if you return a value, F# won't like it.  Below is a simple for loop to iterate through all lowercase letters.

#light

let chars = [|'a'..'z'|]

let PrintChars (c:array<char>) =
  for index = 0 to chars.Length - 1 do
    print_any c.[index]
   
PrintChars chars

But, if I tried to return c from the for loop, F# will complain, but it will allow it to happen.  It's just a friendly reminder that it's not going to do anything with that value you specified.  I could also specify the for loop with a decrementer, so let's reverse our letters this time.

#light

let chars = [|'a'..'z'|]

let PrintChars (c:array<char>) =
  for index = chars.Length - 1 downto 0 do
    print_any c.[index]
   
PrintChars chars

F# also supports the while construct as well.  This of course is the exact same as any imperative construct, but with the caveat of once again, the while loop should not return a value because it is of the unit type.

#light

let chars = ref ['a'..'z']

while (List.nonempty !chars) do
  print_any (List.hd !chars)
  chars := List.tl !chars

This time we're just printing out a char and then removing it from the list collection.  Note that we're using the ref keyword and reference cells as we talked about before.  Lastly, let's cover one last construct, the foreach statement.  This is much like we have in most other languages, just the wording is a bit different.  As always, the foreach statement has the unit type, so returning values is a warning.

#light

let nums = [0..99]

for n in nums do
  print_any n

Wrapping It Up

Just a quick walkthrough of just some of the imperative control statements allowed by F#.  As you can see, it's not a huge leap here from one language to the next.  I have a couple of upcoming talks on F#, so if you're in the Northern VA area on May 17th, come check it out at the NoVA Code Camp.


kick it on DotNetKicks.com
After talking with Greg Young for a little this morning, I realized I missed a few points that I think need to be covered as well when it comes to side effecting functions are code smells.  In the previous post, I talked about side effect free functions and Design by Contract (DbC) features in regards to Domain Driven Design.  Of course I had to throw the requisite Spec# plug as well for how it handles DbC features in C#.

Intention Revealing Interfaces

Let's step back a little bit form the discussion we had earlier.  Let's talk about good design for a second.  How many times have you seen a method and had no idea what it did or that it went ahead and called 15 other things that you didn't expect?  At that point, most people would take out .NET Reflector (a Godsend BTW) and dig through the code to see the internals.  One of the examples of the violators was the ASP.NET Page lifecycle when I first started learning it.  Init versus Load versus PreLoad wasn't really exact about what happened where, and most people have learned to hate it.

In the Domain Driven Design world, we have the Intention Revealing Interface.  What this means is that we need to name our classes, methods, properties, events, etc to describe their effect and purpose.  And as well, we should use the ubiquitous language of the domain to name them appropriately.  This allows other team members to be able to infer what that method is doing without having to dig in with such tools as Reflector to see what it is actually doing.  In our public interfaces, abstract classes and so on, we need to specify the rules and the relationships.  To me, this comes back again to DbC.  This allows us to not only specify the name in the ubiquitous language, but the behaviors as well.

Command-Query Separation (CQS)

Dr. Bertrand Meyer, the man behind Eiffel and the author of Object-oriented Software Construction, introduced a concept called Command-Query Separation.  It states that we should break our functionality into two categories:

  • Commands - Methods that perform an action or change the state of the system should not return a value.
  • Queries - Return a result and do not change the state of the system (aka side effect free)

Of course this isn't a 100% rule, but it's still a good one to follow.  Let's look at a simple code example of a good command.  This is simplified of course.  But what we're doing is side effecting the number of items in the cart. 

public class ShoppingCart
{
    public void AddItemToCart(Item item)
    {
        // Add item to cart
    }
}

Should we use Spec# to do this, we could also check our invariants as well, but also to ensure that the number of items in our cart has increased by 1.

public class ShoppingCart
{
    public void AddItemToCart(Item item)
        ensures ItemsInCart == old(ItemsInCart) + 1;
    {
        // Add item to cart
    }
}

So, once again, it's very intention revealing at this point that I'm going to side effect the system and add more items to the cart.  Like I said before, it's a simplified example, but it's a very powerful concept.  And then we could talk about queries.  Let's have a simple method on a cost calculation service that takes in a customer and the item and calculates.

public class CostCalculatorService
{
    public double CalculateCost(Customer c, Item i)
    {
        double cost = 0.0d;
       
        // Calculate cost
       
        return cost;
    }
}

What I'm not going to be doing in this example is modifying the customer, nor the item.  Therefore, if I'm using Spec#, then I could mark this method as being [Pure].  And that's a good thing.

The one thing that I would hold an exception for is fluent builders.  Martin Fowler lays out an excellent case for them here.  Not only would we be side effecting the system, but we're also returning a value (the builder itself).  So, the rule is not a hard and fast one, but always good to observe.  Let's take a look at a builder which violates this rule.

public class CustomerBuilder
{
    private string firstName;

    public static CustomerBuilder New { get { return new CustomerBuilder(); } }
   
    public CustomerBuilder WithFirstName(string firstName)
    {
        this.firstName = firstName;
        return this;
    }

    // More code goes here
}

To wrap things up, things are not always fast rules and always come with the "It Depends", but the usual rule is that you can't go wrong with CQS.

Wrapping It Up

These rules are quite simple for revealing the true intent of your application while using the domain's ubiquitous language.  As with anything in our field, it always comes with a big fat "It Depends", but applying the rules as much as you can is definitely to your advantage.  These are simple, yet often overlooked scenarios when we design our applications, yet are the fundamentals.

kick it on DotNetKicks.com
More Posts