Providing Safe Alternatives

Wednesday, June 24, 2009

When I was reading through Real World Haskell, I was struck several times by the mention of providing safe function alternatives. The idea is to provide a function that in all cases returns a value as well as the one which is meant to accept valid input and throw exceptions should that contract be violated. There is a real performance consideration to be taken into account as a function which repeatedly throws exceptions as logic will tend to overwhelm a system and slow it down significantly. Imagine if you will an application which reads a large directory to check each file for an X509 certificate, whether it has one or not, and it throws an exception if one is not present. The problem of course is there is no way to determine whether a file was signed at the time using the .NET class without resorting to P/Invoke (my favorite).

Recently on Twitter, there was talk finally of the inclusion of Guid.TryParse in .NET 4.0, yet somehow, Enum.TryParse has not been addressed yet (gee, we’re consistent here). The idea is to provide a way of determining whether the string is a Guid format without throwing an exception, and instead returning a flag indicating success and an out parameter with the value which is set to the value if there is success, else the default value. Before we dig any further, let’s look at some terminology.

Partial Versus Total Functions

To get some terminology straight, let’s talk about partial functions versus total functions. Partial functions are those functions which only return values for a defined subset of valid inputs, as throwing an exception is not considered a return value. On the other hand, functions that return valid results over the entire input are considered to be total functions. Let’s look at a quick example of what that means in F# interactive, the first being a partial function and the latter being a total function.

> open System.Linq;;
> ([] : int list).First();;
System.InvalidOperationException: Sequence contains no elements
   at System.Linq.Enumerable.First[TSource](IEnumerable`1 source)
   at <StartupCode$FSI_0003>.$FSI_0003.main@()
stopped due to error
> ([] : int list).FirstOrDefault();;
val it : int = 0

We see by the above that the First() function is a partial function because it only has defined behavior for non-empty collections, whereas FirstOrDefault() has defined behavior for empty lists and returns the default value for the given generic type. When we’re writing our code, we better know which one we’re dealing with as default values can cause unintended consequences. Without proper tests around these calls, this can be hard to manage.

Mitigating these partial functions with calls to such things as the Count() extension method is not necessarily the answer either. When dealing with non-lazy lists such as an array or a List<T>, then such things as the number of items in our collection is well known and part of the state. In the case of the lazy sequence, this is not the case, so any call to Count() would cause an evaluation of all items in the list, which could lead to unintended consequences. So, if we’re chaining together 4 partial functions together, we could be doing something wrong. All in all, total functions should be preferred usage, but of course within reason.

Can We Do Better Than Try?

Going back to one of the original points, can we do better than the standard Try (Parse/GetValue/etc) pattern that is pervasive throughout .NET code? After all, it tries to do two things at once, and the success flag can be missed if one so chose. F# decided to take a slightly different angle to this problem by returning the success flag and the value as a pair tuple. Such an example would be like this memoize function below:

open System.Collections.Generic

let memoize f = 
  let t = new Dictionary<_,_>()
  fun n ->
    match t.TryGetValue(n) with
    | (true , value) -> value
    | (false, _    ) -> 
        let res = f n
        t.Add(n, res)
        res

What we’re doing above is calling the TryGetValue method which takes a key and returns a success flag as well as the out parameter for the value (set if there is success). F# automatically creates a pair tuple for us which we can then pattern match against to take appropriate action. Unfortunately, for this go around at least, C# does not treat tuples as first class data types, so a solution like this exclusive to F# at this time. Overall, a much cleaner design, but maybe there’s another option?

Maybe there’s an Option…

Instead of dealing with the clumsiness of the above code, I would much rather solve this in a clean way using the F# Option type. This allows us to definitively specify whether we have a value or not. Oh great you say, another version of null. Not quite, as sometimes null can be a value… More to the point, we can have a universal solution to having a value without having to resort to the Nullable<T> or reference type null-ness nonsense. Simply put, the Option type is no more than the following:

type Option<'a> =
  | Some of 'a
  | None

You’ll notice two constructors for the Option type, the Some which takes a value, and the None which takes no parameters. Now, using this to extend the Dictionary class much like the F# Map class, we can add a function which returns an Option type whether we have a value or not.

type System.Collections.Generic.Dictionary<'k,'v> with
  member this.Lookup(k:'k) =
    match this.TryGetValue(k) with
    | (true , value) -> Some value
    | (false, _    ) -> None

Then we can rewrite our memoize function to utilize this in a cleaner fashion using the Option type for pattern matching purposes to show our true intentions of whether we have a value or not.

let memoize f = 
  let t = new Dictionary<_,_>()
  fun n ->
    match t.Lookup(n) with
    | Some v -> v
    | None   -> 
        let res = f n
        t.Add(n, res)
        res

Much cleaner and shows better intent on our part.

Conclusion

How far do we take this? After all, making full functions everywhere sometimes isn’t feasible as we want to constrain our system. Adding both partial and full implementations might just clutter up our entire API. With this comes balance, but there are certainly places where this comes into play such as parsing values, querying values from collections and so on. Could Code Contracts in .NET 4.0 help us here in terms of statically verifying that we’re not failing on a chained partial function? We’re not quite finished here with this discussion as I’ve yet to cover the many values of null.

Agreed, for sure, very well put. Though you'd probably use .Any() instead of .Count() in your example where you use .First()

Jarrett - Monday, July 6, 2009 8:50:45 PM

@Jarrett,

Right, but people tend to short cut things and immediately want the first value without going through the extra step to see if there is anything there.

Matt

podwysocki - Friday, July 10, 2009 5:21:47 AM

More posts of this qautily. Not the usual c***, please

Jetsin - Wednesday, January 18, 2012 11:48:53 AM

Partial Versus Total Functions

Can We Do Better Than Try?

Maybe there’s an Option…

Conclusion

3 Comments