How would the CLR Be Different?

Tuesday, January 13, 2009

UPDATED: Added improved generics with higher-kinded polymorphism

There was a good discussion on Twitter a couple of nights ago that arose due to some issues that with an expression that might return a value, or might not (void) and how you handle them. From those questions an interesting question was posed by Ted Neward, “Knowing what we know now, how would you change the CLR?” Note that this isn’t necessarily a language discussion, but how the underlying framework actually works. It’s a good question that I’ll just lightly dive into, but what I really want to know is, where are the pain points?

If I Had Only Known…

There were a few things to came to mind immediately on how I should answer this. I’ve been bitten by a few items that I’ve seen as limitations imposed on me. I’ve thought a bit about these after my time in Haskell, F# and other languages to come up with a nice list. Some thoughts from Michael Feathers on his ideal language also solidified my thoughts. Let’s go through just a few of them.

Void not treated as a generic argument type
Non-null references
Make immutability easier
Sheer complexity of Code Access Security
Pluggable JIT
Improved generics with higher kinded polymorphism

What do I mean by each of these? First is the infamous System.Void not treated properly as a type. I’ve covered this in the past in my functional C# posts here. As noted, the ECMA Standard 335, Partition II, Section 9.4 "Instantiating generic types" states:

The following kinds of type cannot be used as arguments in instantiations (of generic types or methods):

Byref types (e.g., System.Generic.Collection.List`1<string&> is invalid)

Value types that contain fields that can point into the CIL evaluation stack (e.g.,List<System.RuntimeArgumentHandle>)

void (e.g., List<System.Void> is invalid)

This means that I cannot fully generalize functions and then have to differentiate between the Func<TResult> and Action delegates. In F#, they get around this issue by exposing another type of void, the Unit otherwise known as the empty tuple, so that you can handle those differences. Then, ultimately, it’s up to the compiler to decide what the return should be, whether it gets compiled to void or Unit. I think it should have been allow for this behavior in the BCL, and then it’s up to the language implementation to allow or disallow this behavior.

The second item is the non-null references. One QCon London 2009 presentation caught my eye recently on this very topic, by Tony Hoare, entitled "Null References: The Billion Dollar Mistake". The session is described as the following:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.

I think the abstract alone describes the problem quite well. Indeed, technologies such as Spec# introduced features to allow for non-null references and is a great piece of technology. There is also a switch that sets this behavior as default and then an opt-out option for all those variables that allow null references. But there are some issues of course. Let’s define a quick example of an ArrayList that takes an existing non-null ICollection interface.

public ArrayList (ICollection! c)  
  modifies c.*; 

  ensures _size/*Count*/ == c.Count; 

  { 

    _items = new object[c.Count]; 

    base(); 

    InsertRangeWorker(0, c); 

  }

This looks rather straight forward in terms of the bang notation to specify the non-null behavior, but unfortunately, when compiled down to IL, is handled in a rather ugly way through the use of a modopt, such as the following:

public ArrayList(ICollection modopt(NonNullType) c) { ...

My CodeBetter, Greg Young colleague has noted his objections to the modopt in the past such as here. So, there are issues in the CLR which prevent us from having this rich behavior at this time.

Moving onto the third item brings us to making immutability easier. This way, we can specify that certain classes, fields, parameters and so on, once assigned, cannot change. This metadata can then be used by the JIT to take advantage of it and further optimize. The information is there, but not used in the way I would think it should be.

The fourth item is the sheer complexity of Code Access Security (CAS). Does anyone really understand it, let alone use it? Anyone? * crickets * The ideas seem noble, but I cannot honestly say I’ve seen this used in practice.

The fifth item on the list is dealing with a more pluggable JIT, so that it opens a pipeline for us to do further refining. For example, on constrained systems, we want to further optimize the IL.

Another item that Lennart touched upon below in his comments and me in turn in my last post on monadic substitution was around higher-kinded polymorphism in the CLR generics. Type classes in Haskell for example, provide this example, don’t need to take a type variable of kind *, but take one of any kind. An example is the Haskell monad class such as this:

class Monad m where

  (>>=) :: m a -> (a -> m b) -> m b 

  return :: a -> m a 

instance  Monad Maybe  where

    (Just x) >>= k      = k x 

    Nothing  >>= _      = Nothing

    (Just _) >>  k      = k 

    Nothing  >>  _      = Nothing

    return              = Just

    fail _              = Nothing

In the previous post, I wanted to accomplish something like this which would allow me to build a generic monad builder and then extend the option type to be a part of this:

type MonadBuilder<'M> = 

  abstract member Bind : 'M<'a> * ('a -> 'M<'b>) -> 'M<'b> 

  abstract member Return : 'a -> 'M<'a> 

  abstract member Delay : (unit -> 'a) -> 'a 

let m =  
  { new MonadBuilder<option> with

      member x.Bind(x:'a option, k:'a -> 'b option) : 'b option = 

        match x, k with

        | Some x, k -> k x 

        | None  , _ -> None 

      member x.Return(x) = Some x 

      member x.Delay(f)   = f()

  }

let res = m { return! Some 42 }

Unfortunately, something such as this is impossible given the state of our generics implementation. That’s not to say that we can’t do type classes, because we can in a very limited way and I’ll cover that in another post in regards to type classes for QuickCheck. Hopefully that’s on the table for a future version of F#. Even if F# fixes this issue, it still will be impossible at the CLR level without some sort of hackery.

But Is That All?

There are other issues such as generic constraints and such, but my thoughts aren’t fully thought out as far as what they should be right now. So, I’ll open it up to you, keeping in mind we’re talking about the CLR and not the BCL nor any language implementation. Knowing then what you know now, how would the CLR be different?

I'd also like that all methods are forced to return a value and that way getting rid of that System.Void type. It'd be as easy as replacing every void with unit (which of course isn't at all).
But I would go a step further: also having only one single parameter (a tuple/product type). Unfortunately that would destroy the whole concept of a stack-based IL-code. But maybe it would not. You could implicitly put the values on the evaluation stack into a tuple when you call a method. And on the call site (or at any position between*), you can have access to that tuple. ldarg0, etc. can be implemented as accessing the first element in the tuple, ... (maybe you could introduce a method ldarg which gives you the whole tuple).

*) That's actually why I want to have this stuff. You'd be able to implement type-safe proxies and not such ugly stuff like the RealProxy that we have now.

Thomas Danecker - Tuesday, January 13, 2009 9:17:15 PM

What do you think about type classes? Is it possible to support them in CLR someday or at least in F#?

tomasK - Tuesday, January 13, 2009 10:30:28 PM

@Thomas,

Agreed that I'd rather have unit than void. Ultimately, then it would be up to the language implementer to decide whether to handle the unit value or just throw it away. This gets rid of the hackery that is involved with F#.

Interesting notes around type safe proxies. Noted and I'll have to check that out.

Matt

Matthew Podwysocki - Tuesday, January 13, 2009 11:28:46 PM

@tomasK

Yes, I updated the post to mention them in the context of higher-kinded polymorphism.

Matt

Matthew Podwysocki - Tuesday, January 13, 2009 11:29:21 PM

I wonder if Microsoft will ever make a "new" CLR, with as much of a fresh start as the original CLR had over all the other prior technologies.

Of course it would be insanely costly and migration doubly so, but it seems somewhat inevitable.

rei - Wednesday, July 1, 2009 7:54:54 AM

How would the clr be different.. Amazing :)

weblogs.asp.net - Tuesday, May 10, 2011 4:12:56 AM

How would the clr be different.. Very nice :)

weblogs.asp.net - Thursday, June 23, 2011 11:47:55 PM

If I Had Only Known…

But Is That All?

7 Comments