Dissecting a non-deterministic Windows Forms v2 bug

 

We are glad that we
have just released NDepend v2.4 with the thoroughly revamped UI that I talked
about a few weeks ago on this blog entry.
For those of you that found the NDepend tool for .NET developers too hard to start with, we hope that our
work on usability will help.

 

 

Bug description

 

As always while doing
our final manual tests before releasing, we found a weird bug. We added docking panels
a la VisualStudio to the
VisualNDepend UI.  After
playing around by moving/collapsing/auto hiding all docking panels, the
following exception popup suddenly, while hovering with the mouse one of our
DataGridView:

 

************** Exception
Text **************

System.ObjectDisposedException:
Cannot access a disposed object.

Object name: ‘FloatForm’.

   at
System.Windows.Forms.Control.CreateHandle()

   at System.Windows.Forms.Form.CreateHandle()

   at System.Windows.Forms.Control.get_Handle()

   at
System.Windows.Forms.ToolTip.get_CreateParams()

   at
System.Windows.Forms.ToolTip.CreateHandle()

   at
System.Windows.Forms.ToolTip.Hide(IWin32Window win)

   at
System.Windows.Forms.ToolStrip.UpdateToolTip(ToolStripItem item)

   at
System.Windows.Forms.ToolStripItem.OnMouseHover(EventArgs e)

   at
System.Windows.Forms.ToolStripItem.FireEventInteractive(EventArgs e,
ToolStripItemEventType met)

   at System.Windows.Forms.ToolStripItem.FireEvent(EventArgs
e, ToolStripItemEventType met)

   at
System.Windows.Forms.MouseHoverTimer.OnTick(Object sender, EventArgs e)

   at
System.Windows.Forms.Timer.OnTick(EventArgs e)

   at
System.Windows.Forms.Timer.TimerNativeWindow.WndProc(Message& m)

   at
System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr
wparam, IntPtr lparam)

 

 

When you see such a
stack panel with none of your method inside, you immediately realize that your
evening at work will be longer than expected (and it was the case uh!). We suspected
first the framework DXperience
from DevExpress on which we rely for docking panels and knock them on their forum.
Hopefully, they knew this problem and immediately answered that it is a Windows
Forms bug.

 

 

Reproducing the bug

 

The DevExpress support kindly provided a small C# project that reproduces the problem
(downloadable from here).
To reproduce the bug with this project:


  • 1.   Start the
    application.
  • 2.   Press the
    “Click” button (do not hover over the toolstripbutton).
  • 3.      
    Now hover
    over the toolstripbutton to display the tooltip.
  • 4.      
    Close the
    “test form”
  • 5.      
    Again,
    hover over the toolstripbutton => ObjectDisposedException

 

The bug comes from the
fact that the docking panel implementation changes the parent window of the underlying
ToolTip control assigned to the DataGridView. When hovering the DataGridView
after changing its parent window, if the previous parent window object has been
disposed, you get the exception.

 

 

An idea for the fix

 

Hopefully, I found here
a workaround on the ActiproSoftware
forum. As DevExpress, ActiproSoftware is a Windows Forms control vendor and,
without surprise, they also faced the problem. The idea is to obtain the
private underlying tooltip object with reflection, and then call the method RemoveAll() on
it when the parent windows is changing. This way you force re-initialization of
the link to parent window. The code looks like this:

 

ToolTip t = (ToolTip)toolStrip1.GetType().GetProperty(
   "ToolTip", BindingFlags.Instance | BindingFlags.NonPublic
).GetValue(toolStrip1, null);
 
t.RemoveAll();
  
    

A fix not that easy to implement

 

This code works well
when the problem comes from a ToolStrip control, but, of course, it doesn’t work on
DataGridView. I wanted to use
Reflector to see where was hidden the underlying ToolTip of a DataGridView but unfortunatly I didn’t find it. Indeed, the DataGridView is a monster class with more than 10.000 lines of code, 1053 methods, 322 fields and 13 nested classes. I then wrote the following CQL query with NDepend to make sure that the class DataGridView is using directly or indirectly the class ToolTip.

 

SELECT TYPES WHERE IsUsing “System.Windows.Forms.ToolTip” AND NameIs “DataGridView”

 

The query told me that DataGridView is using a class that uses ToolTip. To find this intermediate class I used the following CQL query: Which class is directly used by DataGridView and uses directly ToolTip:

 

SELECT TYPES WHERE
IsDirectlyUsedBy “System.Windows.Forms.DataGridView” AND
IsDirectlyUsing “System.Windows.Forms.ToolTip”

 

The 2 matching classes are the public classes System.Windows.Forms.ContextMenuStrip and the internal nested class System.Windows.Forms.DataGridView+DataGridViewTool. It was then easy to find were the pesky ToolTip object was hidden and we wrote the following code:

FieldInfo
toolTipControlFieldInfo =    

   typeof(DataGridView).GetField(

      “toolTipControl”,
BindingFlags.Instance | BindingFlags.NonPublic);

 

FieldInfo toolTipFieldInfo =

   toolTipControlFieldInfo.FieldType.GetField(

      “toolTip”,
BindingFlags.Instance | BindingFlags.NonPublic);

 

object toolTipControlInstance =

   toolTipControlFieldInfo.GetValue(m_DataGridViewItems);

 

ToolTip toolTip =

   toolTipFieldInfo.GetValue(toolTipControlInstance)
as ToolTip;

 

if (toolTip != null)
{  //Can be
null at init.

   toolTip.RemoveAll();

}

 

I know how ugly it is to rely on private implementation but, here we have no choice.

 

 

Checking that the bug is corrected by .NET3

 

We found out that the bug was impossible to reproduce on
our main development machines because it is in fact corrected with .NET3. I
explained in the post .NET
3.5 Core Stuff
that, even though Microsoft made the decision to avoid touching
the .NET Framework assemblies (such as System.Windows.Forms.dll), they took a
chance to correct some bug.

 

We then used the build
comparison feature of NDepend to see if one the method in the buggy stack trace
has been modified (interestingly enough, we figured out that 84 methods of System.Windows.Forms.dll
were changed, 202 were added and 37 were removed). Here is the CQL query that matches the
changed methods in System.Windows.Forms.ToolTip:

 

SELECT METHODS FROM TYPES “System.Windows.Forms.ToolTip” WHERE CodeWasChanged

 

The result is the
following…

 

Methods

NbILInstructions

SetTool(IWin32Window,String,ToolTip+TipInfo+Type,Point)

232

CreateHandle()

201

SetToolTipInternal(Control,ToolTip+TipInfo)

147

WmPop()

126

Hide(IWin32Window)

88

SetToolInfo(Control,String)

59

 

 

…and indeed the method
ToolTip.Hide(), shown in the buggy trace, has been changed (76 to 88 IL instructions).
We then used Reflector to
see the code change and indeed there is a test…

 

 if (this.GetHandleCreated())

 

… to check the parent
window when hiding the tooltip.

 

 

Making sure that our users won’t be annoyed by the bug

 

With the hack described,
it seemed that everything worked fine. However, I was not confident since the
bug is indeterminist and might be still luring around others controls and we
then decided to not popup exception whose stack trace contains the string
“System.Windows.Forms.ToolTip.CreateHandle()”.

 

I know how ugly is
this last choice but in the real-world you sometime not have the choice.

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Johann Holzel

    You know, this kind of thing is exactly why I don’t use Microsoft technologies anymore.

    Back in ’97, I had two really nasty bugs in the same product, one on the Windows GUI, and one on the Linux GUI. The first one, we quickly traced down to MFC; the second, to Gtk.

    It only took a day or two to find exactly what was wrong with both libraries. In both cases, there was no easy way to “hook” the code to fix it (Gtk is a C API; MFC is all C++, but the relevant function in CView wasn’t virtual), and a workaround would have been a nasty mess in the MFC case, maybe not even possible in the Gtk case.

    So, what did I do? First I submitted a bug report, along with a patch, to Microsoft, and likewise to the Gimp/Gtk/Gnome/XCF team.

    The Microsoft bug was never closed; about two years later, VC6 came out with a new version of MFC without that bug, but even then our code had to be reworked to build on VC6 and use the new MFC42. In the meantime, we had to recode big chunks of CView and the subclasses we were using, wrap them up in an MFC extension DLL, add 40% to the total size of our install, and hold our release back for two weeks while legal verified that we were actually allowed to ship.

    My Gtk patch was accepted almost immediately. It was in the main 1.0 tree within a few days, and in the Redhat packages within a few weeks. By the time we shipped, most of our customers probably already had it. (Just in case, we put RPMs, raw binaries, the diff, and complete sources on our site, but nobody downloaded any of them.)

    Sure, the VC form designer was more polished than Glade, but in the end, the 3 minutes saved building dialog templates were not worth the huge costs of being tied to their libraries.

    Today, I still write software for Windows, and for .NET. And I use their compilers. But I don’t use WinForms unless I have to. With Gtk# or Qt#, I can fix problems myself, or get them fixed quickly, and my customers don’t have to install an entirely new .NET runtime; just one DLL.

  • http://www.NDepend.com Patrick Smacchia

    Dama, this is a tricky thing not very well documented.
    I’ll certainly write a blog post to clarify things but basically you need to handle the event

    System.Windows.Forms.Application.ThreadException += UnhandledExceptionOnUIThread;

    private void UnhandledExceptionOnUIThread(object sender,
    ThreadExceptionEventArgs e) {
    // teh exception is reachable in e.Exception
    // eventually show here you own errorForm dialog
    if( mustAbort) {
    m_MainForm.Close();
    }
    }

    this will remove the WindowsForms default error dialog and let you a chance to swallow the exception / resume the program or abort …

    This is not obvious at all because the behavior is not the same when debugging your app.
    Hope this help.

  • dama

    Gr8 post. How did you stop the exception when the stack trace contains the string
    “System.Windows.Forms.ToolTip.CreateHandle()”.

    thx

  • http://lextm.blogspot.com Li Yang

    Nice to see how to use NDepend and Reflector to locate a fix. Yes, wonderful demo for NDepend. This is a great post.