Sponsored By Aspose - File Format APIs for .NET

Aspose are the market leader of .NET APIs for file business formats – natively work with DOCX, XLSX, PPT, PDF, MSG, MPP, images formats and many more!

How would the CLR Be Different?

UPDATED: Added improved generics with higher-kinded polymorphism

There was a good discussion on Twitter a couple of nights ago that arose due to some issues that with an expression that might return a value, or might not (void) and how you handle them.  From those questions an interesting question was posed by Ted Neward, “Knowing what we know now, how would you change the CLR?”  Note that this isn’t necessarily a language discussion, but how the underlying framework actually works.  It’s a good question that I’ll just lightly dive into, but what I really want to know is, where are the pain points?

 

If I Had Only Known…

There were a few things to came to mind immediately on how I should answer this.  I’ve been bitten by a few items that I’ve seen as limitations imposed on me.  I’ve thought a bit about these after my time in Haskell, F# and other languages to come up with a nice list.  Some thoughts from Michael Feathers on his ideal language also solidified my thoughts.  Let’s go through just a few of them.

  • Void not treated as a generic argument type
  • Non-null references
  • Make immutability easier
  • Sheer complexity of Code Access Security
  • Pluggable JIT
  • Improved generics with higher kinded polymorphism

What do I mean by each of these?  First is the infamous System.Void not treated properly as a type.  I’ve covered this in the past in my functional C# posts here.  As noted, the ECMA Standard 335, Partition II, Section 9.4 "Instantiating generic types" states:

The following kinds of type cannot be used as arguments in instantiations (of generic types or methods):

  • Byref types (e.g., System.Generic.Collection.List`1<string&> is invalid)
  • Value types that contain fields that can point into the CIL evaluation stack (e.g.,List<System.RuntimeArgumentHandle>)
  • void (e.g., List<System.Void> is invalid)

This means that I cannot fully generalize functions and then have to differentiate between the Func<TResult> and Action delegates.  In F#, they get around this issue by exposing another type of void, the Unit otherwise known as the empty tuple, so that you can handle those differences.  Then, ultimately, it’s up to the compiler to decide what the return should be, whether it gets compiled to void or Unit.  I think it should have been allow for this behavior in the BCL, and then it’s up to the language implementation to allow or disallow this behavior.

The second item is the non-null references.  One QCon London 2009 presentation caught my eye recently on this very topic, by Tony Hoare, entitled "Null References: The Billion Dollar Mistake".  The session is described as the following:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.

I think the abstract alone describes the problem quite well.  Indeed, technologies such as Spec# introduced features to allow for non-null references and is a great piece of technology.  There is also a switch that sets this behavior as default and then an opt-out option for all those variables that allow null references.  But there are some issues of course.  Let’s define a quick example of an ArrayList that takes an existing non-null ICollection interface.

public ArrayList (ICollection! c) 
  modifies c.*;
  ensures _size/*Count*/ == c.Count;
  {
    _items = new object;
    base();
    InsertRangeWorker(0, c);
  }
 

This looks rather straight forward in terms of the bang notation to specify the non-null behavior, but unfortunately, when compiled down to IL, is handled in a rather ugly way through the use of a modopt, such as the following:

public ArrayList(ICollection modopt(NonNullType) c) { …
 

My CodeBetter, Greg Young colleague has noted his objections to the modopt in the past such as here.  So, there are issues in the CLR which prevent us from having this rich behavior at this time.

Moving onto the third item brings us to making immutability easier.  This way, we can specify that certain classes, fields, parameters and so on, once assigned, cannot change.  This metadata can then be used by the JIT to take advantage of it and further optimize.  The information is there, but not used in the way I would think it should be.

The fourth item is the sheer complexity of Code Access Security (CAS).  Does anyone really understand it, let alone use it?  Anyone?  * crickets *  The ideas seem noble, but I cannot honestly say I’ve seen this used in practice.

The fifth item on the list is dealing with a more pluggable JIT, so that it opens a pipeline for us to do further refining.  For example, on constrained systems, we want to further optimize the IL.

Another item that Lennart touched upon below in his comments and me in turn in my last post on monadic substitution was around higher-kinded polymorphism in the CLR generics.   Type classes in Haskell for example, provide this example, don’t need to take a type variable of kind *, but take one of any kind.  An example is the Haskell monad class such as this:

class Monad m where
  (>>=) :: m a -> (a -> m b) -> m b
  return :: a -> m a

instance  Monad Maybe  where
    (Just x) >>= k      = k x
    Nothing  >>= _      = Nothing

    (Just _) >>  k      = k
    Nothing  >>  _      = Nothing

    return              = Just
    fail _              = Nothing

In the previous post, I wanted to accomplish something like this which would allow me to build a generic monad builder and then extend the option type to be a part of this:

type MonadBuilder<‘M> =
  abstract member Bind : ‘M<‘a> * (‘a -> ‘M<‘b>) -> ‘M<‘b>
  abstract member Return : ‘a -> ‘M<‘a>
  abstract member Delay : (unit -> ‘a) -> ‘a
    
let m = 
  { new MonadBuilder<option> with
      member x.Bind(x:‘a option, k:‘a -> ‘b option) : ‘b option =
        match x, k with
        | Some x, k -> k x
        | None  , _ -> None
      member x.Return(x) = Some x
      member x.Delay(f)   = f()
  }
  
let res = m { return! Some 42 }

Unfortunately, something such as this is impossible given the state of our generics implementation.  That’s not to say that we can’t do type classes, because we can in a very limited way and I’ll cover that in another post in regards to type classes for QuickCheck.  Hopefully that’s on the table for a future version of F#.  Even if F# fixes this issue, it still will be impossible at the CLR level without some sort of hackery.

But Is That All?

There are other issues such as generic constraints and such, but my thoughts aren’t fully thought out as far as what they should be right now.  So, I’ll open it up to you, keeping in mind we’re talking about the CLR and not the BCL nor any language implementation.  Knowing then what you know now, how would the CLR be different?

This entry was posted in CLR, F#, Functional Programming, Haskell, Spec#. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @David

    Ok, let’s rework those into C# parlance then, shall we?

    TResult Using(T obj, Func func) where T : IDisposable
    TResult Lock
    (T obj, Func
    func) where T : class

    This allows me to generically create a using statement no matter whether I actually return a value or not. The problem with C# and other languages is that in order to not return a value, you have to use the Action delegate instead of those type signatures from above. That’s suboptimal.

    And yes, that’s what I’m saying is that if C# wants to not allow that, fine, but I think you’ll find many C# people now think it wasn’t the best idea to not allow void.

    Why the F# way as the default and not the C# way? Because it’s a lot easier to go from supporting it in the CLR, and then one language implementation not allowing it (ala C#), and then another language to decide to support it (ala F#). Instead, you have it where the CLR more closely aligned with C#, where F# has to go well out of its way to fake out the underlying system. You’ll find that as we get more functional constructs inside C# that you’ll run into this pain sooner rather than later when trying to create generic functions just as I did above.

    Matt

  • http://www.commongenius.com David Nelson

    Excuse my ignorance, but I am not familiar enough with F# to know what those signatures mean, and I also could not make them compile in my F# scratch pad.

    I didn’t say that you said that every method has to return something. Please don’t try to read more into my comment than you apparently did :) What I said was “you don’t want it to be possible for a method to not have a return type”, which you seemingly confirmed by stating that if C# wanted a void return, then “underneath the covers it would compile it down to a null return”, i.e. it would have to emit the method as having a return type and then force it to return null. While this is not completely unreasonable, I guess my question is why bother? Some languages want it one way, some the other. No matter which way the CLR goes, some languages are going to have to create a workaround. So why do you think the CLR should go the F# way instead of the C# way?

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @David

    That’s not quite right in terms of different patterns for Action/Func. Yes, they often differ as with an iterator versus a map, etc.

    But what about the following functions here as defined in the base F# libraries?
    (‘a -> (‘a -> ‘b) -> ‘b) when ‘a :> System.IDisposable
    (‘a -> (unit -> ‘b) -> ‘b) when ‘a : not struct

    Now either of those may return a value, or they may not. That’s up to me on when/how I use them.

    Now when I was talking about the Void issue, I said that was a language choice if C# wanted voids, I wouldn’t stop them, and underneath the covers it would compile it down to a null return or something like that. Nowhere did I ever say that every method had to return something, so please don’t try to read more into it than you apparently did.

    Matt

  • http://www.commongenius.com David Nelson

    @Matt,

    It depends on what you mean by differentiate. In every situation in which I have needed to use Func or Action, one or the other has been appropriate, but never both at the same time. It therefore seems appropriate to me that they remain separate delegate types.

    As far as returning Void, it sounds like what you are really saying is that you don’t want it to be possible for a method to not have a return type. If that’s the case, why not simply use that methodology in your own code? There is nothing that requires you to ever use the void keyword in C#.

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @Josh

    That’s more of a BCL issue as David stated. What we’re looking at here are issues from the CLR directly. Interesting that you say that LINQ doesn’t work on strings, because in F#, I can do the following and it works fine:

    open System.Linq

    “foo”.Select(fun x -> x) // Returns seq[‘f';’o';’o’]

    Matt

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @David

    Regarding the Void not treated as a type could be as simple as this:
    return ()

    Which is treating the void as an empty tuple (unit). And then again, it’d be up to the languages themselves on how they translate their own code down to the IL, but to say that you can’t use it that way is the problem.

    You’ve never run into this issue? Have you had to differentiate between Func and Action delegates? If so, then you’ve run into this issue.

    I agree that sandboxing is important, but I don’t think it’s intuitive enough for people to use without going through many configuration screens. Tools such as ClickOnce help, but in the early days, there weren’t those helpers, which turned people off. Since that time, they never really looked back.

    Matt

  • http://www.commongenius.com David Nelson

    @Josh

    The fact that String implements IEnumerable has nothing to do with the CLR; String and IEnumerable are in the BCL (although the CLR does have some special knowledge of String).

    I have never found myself in a situation where I had to use an if statement like your example; on the other hand, being able to use LINQ methods on strings does occasionally come in handy (despite the fact that they don’t show in Intellisense).

  • http://www.josheinstein.com Josh Einstein

    Biggest “ah crap” moment of the CLR… realizing the mess caused by System.String implementing IEnumerable. Now everything that wants to work on singletons and sequences has to do stuff like:

    if ( blah is IEnumerable && !(blah is string) ) { … }

    Even Visual Studio doesn’t show the extension methods for IEnumerable on a string type.

  • http://www.commongenius.com David Nelson

    Non-null references are definitely at the top of my list, and I agree with the need for stronger immutability. I have never run into the need for a pluggable JIT or higher kinded polymorphism, so I can’t comment there. As for Void as a generic type argument, I am on the fence. I have never had a need for it personally, though I can see where it might make sense in some cases. But every language I have ever programmed in has differentiated between those methods that return a value and those that don’t (VB by Function/Sub, the C family by “return value” versus “return”, etc). So there is some inertia to overcome there.

    Regrading Code Access Security, I don’t see it as being particularly complex. I have not made extensive use of it, but when I have used it, it has seemed pretty straightforward to me. I think the main reason it is not used is the same reason why major websites get hacked by SQL injection and XSS: developers STILL don’t think about security in their code. Sandboxing using CAS is actually something that I think could revolutionize the web if it were given more attention. The ability to run fully functional applications over the web in a secure environment is extremely powerful.

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @JaredPar,

    I like the idea around the Option/Maybe for those which may allow for no value. And I agree, I’d rather have static enforcement, because Option< 'a> /= ‘a and should be treated as such.

    Matt

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @Justin

    I understand that sandboxing is necessary, but I believe the underlying system is quite cryptic and not really well understood, hence not widely used.

    Matt

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @Lennart,

    I was just having this conversation with some and let me update this to reflect that. It’s a good point to drive home that I can’t specialize my monads because of this and swap one for another as I noted in my last post.

    Matt

  • http://blogs.msdn.com/jaredpar JaredPar

    I love all of your suggestions but I think my first choice would be for non-null references. I find it’s much easier to reason about a code base that does not allow null references by default.

    Even though we don’t have a great way of forcing it (yet) except via code review, we took the following policy on our managed code base.

    1) You cannot return null, period.
    2) If there exist a property which given certain conditions may or may not exist, use an Option class to maintain the reference.

    IMHO, this really increases the maintainability of our code. If something can be null it’s explicitly documented in the code with an Option<> class. If not you can safely assume it will always have a value.

    I would much rather have explicit enforcement of these details but for now it’s a step forward

  • Lennart Augustsson

    The generics in the CLR lack higher kinded type variables. This means that abstractions such as monads can’t be done on the CLR level. You can implement specifix monads, but not have code that is generic in the monad, because m::*->*.

  • http://justrudd.org/ Justin Rudd

    Nothing new to add to your list, but I would take out your bullet point about CAS. I did use it quite extensively back in 2002/3 before click once was readily available for .NET 1.0/1.1. And now with click once, CAS is used quite extensively. Sandboxes are good. And CAS is definitely better than the Java VM alternative.

    I think one reason it is not easier or better documented is because most people don’t use it. Click Once takes care of most of the details for you.