Beware mixing List.Sort() and Thread.Abort()

After more than a decade of full-time development with .NET, I still discover surprising unfixed bugs in the heart of the framework. Today, while I was smoke-testing new stuff in the NDepend UI, I stumbled on an InvalidOperationException thrown by a call to List<T>.Sort() ?!

After 15 minutes of searching the cause of the bug, I found this unfixed bug on Microsoft Connect: List.Sort() throws InvalidOperationException (instead of re-throw ThreadAbortException) while thread is being aborted

In other words, if your application is using Thread.Abort() somehow, and that the abortable code is calling List<T>.Sort(), you are exposed to the risk of getting an InvalidOperationException instead of a ThreadAbortException.

I already hear that it is a good practice to avoid calling Thread.Abort(), but the fact is that if one wants a responsive UI, one needs some sort of abortable/timed-out background work, and at a point Thread.Abort() will have to be called for that. As long as calling Thread.Abort() is done in a safe way (see this Joe Duffy detailled post) we didn’t meet any problems so far. On millions of production runs logged, we cannot see any problem except … that List<T>.Sort() throws an InvalidOperationException with the frequency of around one on every 50K production runs. And today, this bug just popped to me in a reproductible way thanks to some refactoring I was doing.

To workaround this problem once for all, I enclosed all calls to any override of List<T>.Sort() in some extension methods in a dedicated class. These BugFreeSort() methods catch the InvalidOperationException. This is fine because anyway the thread is in a ThreadState.AbortRequested and the CLR will throw the ThreadAbortException at the end of the catch block. Also, attached to this class through an attribute, I declare a CQLinq code rule in the source code to make sure that all calls to List<T>.Sort() are made through one of these BugFreeSort() methods. Here is the code:

Here is the code rule declared in the code. Hopefully any .NET developer acquainted with the LINQ syntax, should be able to read this piece to code:

And here is a screenshot of the NDepend methods violating the code rule before refactoring:

We have almost 200 code rules declared in source code like this newly added one. They protect us from stumbling again and again in tricky identified scenarios. In other words with these rules we abide by the tenet: Fool me once, don’t fool me twice.

In a team context, doing so is especially useful since such rule is warning a developer when he’s doing something wrong. Such warning is actually a just-in-time automatic piece of education for newcomers developers, about how things must be done. Communication in the team is enhanced, the overall friction is decreased, and I found this pretty cool :)

This entry was posted in .NET Framework, .NET Fx, C#, Code Query, Code Rule, CQLinq. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Chris S

    A more accurate title might be “Beware Thread.Abort()”

  • PatrickSmacchia

    Because we are still running on .NET 3.5, and will still do at least until .NET Fx 5 (and VS post VS2012) will be released! There are tricks to get the TaskParallelLibrary working on .NET v3.5, but it is unsupported, so we still prefer to support our own TPL by now. http://stackoverflow.com/questions/2987439/can-i-use-the-task-parallel-library-in-a-net-3-5-project

  • Jarle

    Stupid question:

    Why are you making your own implementation of Task.Wait()?

    Task task = Task.Factory.StartNew(() => DoSomething());
    task.Wait(100);
    if (task.IsCompleted) { … }

  • Patrick Smacchia

    So I didn’t express myself properly. We are not aborting threads, in the sense we don’t kill them. But we are calling Thread.Abort() and then catch ThreadAbortException and use the method ResetAbort(). For example we got inspiration from this WaitFor code: https://github.com/Lokad/lokad-shared-libraries/blob/master/Source/Lokad.Shared/Threading/WaitFor.cs As said, we have millions of production runs logged, that worked on top of such code, and except this List.Sort() bug, we’ve never observed another issue.

  • http://barrkel.blogspot.com/ barrkel

    That’s not a good enough excuse for using Thread.Abort.

  • airhed13

    I reject the premise: “but the fact is that if one wants a responsive UI, one needs some sort of abortable/timed-out background work, and at a point Thread.Abort() will have to be called for that.”
    Cooperative cancellation is tried and true and doesn’t result in undefined behavior throughout your program. Thread.Abort() is a bug if reliability is a requirement.
    Joe Duffy’s article seems to say precisely the opposite of what you imply:
    “… the .NET Framework cannot be trusted when any of the aforementioned exceptions are thrown.” where one of the aforementioned exceptions is ThreadAbort. Joe Duffy is essentially stating that rudely aborting a thread destroys trust in the underlying framework. Just don’t do it.