CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Patrick Smacchia [MVP C#]

March 2008 - Posts

  • Towards Bug-Free Code

     

    The only way to have bug-free code is to mathematically prove the code. Very few programs in the world are mathematically proved. Some industry can afford the price of a mathematical proof, especially when bugs would turn into human death, such as embedded software in planes, trains or cars. Most of us are working on projects that cannot afford the cost of a mathematical proof. We then rely on some tricks to maintain our bug rate as low as possible. I classify these tricks into 6 categories.
    • Contract
    • Automatic test
    • Empirical approach
    • Code review
    • Prioritizing bugs fix over new features development
    • Programming style and code quality
    • Static Analysis Tools

     

     

    Contract

     

    The idea of contracts is to insert correctness information inside the code. When contract is violated it means that your code contains a bug somewhere. Typically, the .NET world is very poor in terms of contracts. Our only way to express contract is to use the System.Diagnostics.Debug.Assert(…) method. This is very unfortunate because not only our code is blurred with numerous Debug.Assert(…) calls, but also the compilers cannot check contracts at compile-time. For example the Spec# non-nullable types is an excellent form of contract at language level that help a lot avoiding pesky NullReferenceException. And Spec# is far from being the only language that propose contract facilities, think about what Eiffel proposed more than twenty years ago in terms of contract!

     

    As I believe in contract, I personally use a lot Debug.Assert(…). This represents around 15% of my code. I will likely blog more on this because with all the buzz actually made around automatic tests, I feel that contracts should deserve more attention.

     

     

    Automatic test

     

    I won’t enumerate the numerous benefits of automatic tests and high code coverage ratio here. The important thing to remember is that having a solid battery of automatic tests is an excellent way to decrease significantly the bug rate on tested code, but also to avoid new bugs when code gets refactored. As many, my opinion is that the cost of writing automatic tests is worth the price compared to the cost of maintaining not automatically tested code.

     

     

    Empirical approach

     

    What I call empirical approachis a simple tenet that every seasoned programmer know:

    • Most of bugs in a new version are coming from modified code and brand new code
    • Unchanged code that works well for a long time in production won't likely crash within the next release (what we call stable code). We don't say that stable code don't contain bug, but discovering a bug in stable code is rare.

    How do you know that some code works well for a long time in production? Simply by listening to users. If they didn’t report problems on some features for a long time then you can be confident that underlying code is stable. Some might say that I consider here users as testers but don’t take me wrong, we don’t have the choice. If you have real users, they will complain when they will find a bug and they will remain silent when they consider that the product is working fine. You are responsible to deliver correct code that will satisfy users but why wouldn’t you infer statistics from their feedbacks to asses where is your stable/unstable code?

     

    I wrote an article about how we (the NDepend team) use this simple but effective idea and our own dog-food to avoid regression bugs while adding new features. Basically we focus our code reviews mainly on code that have been added or code that have been changed since the last release and of course.

     

    Code review

     

    Code reviews are good to enhance quality and to educate programmers but I don’t believe in the efficiency of code reviews to anticipate bugs. Even though some portions of code can be fascinating to read, the bulk of a code base is tedious and will make you lose your focus in less than an hour. The problem of code review comes from the mass of code to read. This is why I advise to only focus your review on not stable code, i.e added code and modified code. If you release new versions often, the mass of code to review before each release will quickly become an epsilon of the size of your entire code base. Doing so can also be seen as a way to capitalize on code reviews made during previous iterations.

     

    Prioritizing bugs fix over new features development


    This popular methodology directly results from the concept of stable code. Prioritizing bugs fix over new features development can be seen as a way to constantly struggle to maximize the surface of stable code in our code base.

     

    Programming style and code quality

     

    The recent buzz around LINQ or F# comes from the fact that object style programming is more bug-prone than functional style programming. This fact results from the expressiveness of functional style. In other words, functional code is easier to read and understand. A major aspect of the expressiveness of functional programming style is IMHO the concept of immutability that I described in a previous blog post Immutable types: understand their benefits and use them. That’s a fact, it is hard to write, understand and maintain code that mutates states at runtime (for example this is why global variables are so harmful). And this becomes much harder in concurrent environment.

     

    Obviously, code quality has also a direct impact on code correctness. Anti pattern such as methods with high LOC, high Cyclomatic Complexity or high Nesting Depth, entangled components, methods with multiple concerns, classes with multiple responsibilities… leads to code harder to debug and to test.

     

     

    Static Analysis Tools

     

    We all dream of a static analyzer that could pinpoint automatically bugs in our code by just pushing an Analyze button. Some tools are already able to detect naïve mistakes, such as calling a method with a null reference as parameters where there are no tests of parameter nullity (i.e interprocedural analysis).

     

    But most bugs are not that easy to pinpoint. By just analyzing the code a tool cannot distinguish between a feature and a bug because it doesn’t know how the application should behave. Some heuristic exists but still, to be efficient, a bug finder static analyzer needs to be feed with more information than just the code. Typically, this extra information can be found in contracts and unit-tests code. As far as I know, in the .NET area there are 2 projects revolving around this idea NStatic and Pex, and I am really looking forward to use them on my own code.

     

     

    Conclusion

     

    From what I understand from the agile trend, having a bug-free product to release shouldn’t be the goal. Anyway, every program not mathematically proved has bugs. The goal should then be to tend toward a bug-free product by applying as many correctness tricks as possible.

  • Number of Types in the .NET Framework (2)

     

     

    I am impressed by the buzz done around my last post on Number of Types in the .NET Framework. Actually it was just a quick post that I wrote after having read the Brad Abrams Number of Types in the .NET Framework post to see what NDepend would report on metrics on the .NET Fx v3.5. I didn’t expect that this would turn in a debate around the completeness and the download size of the .NET Fxv3.5 and even Java Vs. .NET. As a heavy user of .NET since the early beginning back in 2001 and as a big fan of this platform, my opinion is biased even thought I think that some parts are perfectible such as:

     

    • Collections are duplicated (for legacy reason)
    • The design of the IO API that after 7 years I still found not being intuitive

    But so far, these small issues don’t matter when I take account of the benefit of working with .NET:

    • A super innovative language team able to bring a ground-breaking enhancement every 2 years.
    • A powerful compiler, able to compile my 60K lines of C# code in less than 5 seconds.
    • Maybe the best debugger of all times.
    • An optimized CLR, close to the hardware, that let me for example use pointers from managed code whenever I want
    • An active community able to make diamonds such as Mono.Cecil .

    Instead of rambling on this sensible debate, I prefer exposing more metrics relative to the .NET Fx v.3.5.

     

     

    Size of methods

     

    The size of methods is computed in terms of number of IL instructions. As the .NET Fx v3.5 is compiled with optimization, you can guess the equivalent number of lines of C# or VB.NET code by dividing values by 5.

     

    SELECT METHODS WHERE NbILInstructions > 0 AND !IsClassConstructor ORDER BY NbILInstructions DESC

     

    #Methods: 336 908    Average: 24.66 IL Instructions    Std Dev: 58.05

     

    SELECT TOP 5 METHODS WHERE NbILInstructions > 0 AND !IsClassConstructor  ORDER BY NbILInstructions DESC

     

    Full Name

    # IL instructions

    System.Windows.Markup.KnownTypes.GetKnownTypeConverterIdForProperty(

    KnownElements,String)

    6228

    MS.Internal.Markup.KnownTypes.GetKnownTypeConverterIdForProperty(

    KnownElements,String)

    6228

    System.Security.Cryptography.RIPEMD160Managed.MDTransform(

    UInt32*,UInt32*,Byte*)

    5294

    MS.Internal.Markup.TypeIndexer.InitializeOneType(KnownElements)

    3766

    System.Web.Configuration.BrowserCapabilitiesFactory.

    PopulateBrowserElements(IDictionary)

    3596

     

    The class constructors are not taken account because they biase the result since they all the bunch of code that initialize the value of static fields.

     

     

    Cyclomatic Complexity and nested depth of methods

     

    As we don’t have source code of the .NET Fx, we infer the Cyclomatic Complexity from the IL code as explained here. We generally obtain values slightly higher than the Cyclomatic Complexity obtained from source code.

     

    The second practical metric to measure complexity is the Nesting Depth, here also inferred from IL, whose definition is available read here.

     

    SELECT METHODS WHERE NbILInstructions > 0 ORDER BY ILCyclomaticComplexity  DESC,ILNestingDepth DESC

     

    #Methods: 341 842    Average IL Cyclomatic Complexity: 1.68     Std Dev: 5.48

    Average IL Nesting Depth: 0.68     Std Dev:  1.80

     

    These numbers attest from a great overall quality, but it is still possible to find some monsters (I don’t know if these monsters are generated method or handcrafted methods?):

     

    SELECT TOP 5 METHODS WHERE NbILInstructions > 0 ORDER BY ILCyclomaticComplexity DESC

     

    Full Name

    # IL instructions

    IL Cyclomatic Complexity (ILCC)

    System.Windows.Markup.KnownTypes.GetKnownTypeConverterIdForProperty(

    KnownElements,String)

    6228

    872

    MS.Internal.Markup.KnownTypes.GetKnownTypeConverterIdForProperty(

    KnownElements,String)

    6228

    872

    MS.Internal.Markup.TypeIndexer.InitializeOneType(KnownElements)

    3766

    761

    System.Windows.Markup.TypeIndexer.InitializeOneType(

    KnownElements)

    3046

    760

    System.Windows.Markup.KnownTypes.CreateKnownElement(KnownElements)

    1729

    549

     

     

    SELECT TOP 5 METHODS WHERE NbILInstructions > 0 ORDER BY ILNestingDepth DESC

    Full Name

    # IL instructions

    IL Nesting Depth

    Microsoft.JScript.Convert.Coerce2WithNoTrunctation(Object,TypeCode)

    1539

    268

    System.Windows.Markup.KnownTypes.GetKnownPropertyAttributeId(

    KnownElements,String)

    1979

    190

    MS.Internal.Markup.KnownTypes.GetKnownPropertyAttributeId(

    KnownElements,String)

    1979

    190

    System.Web.Configuration.BrowserCapabilitiesFactory.UpProcess(

    NameValueCollection,HttpBrowserCapabilities)

    1129

    181

    MS.Internal.Markup.KnownTypes.GetKnownTypeConverterIdForProperty(

    KnownElements,String)

    6228

    172

     

     

     

    Number of Parameters of public methods and Variables

     

    The number of parameters of a method represents an easy way to measure quality:

     

    SELECT METHODS WHERE IsPublic ORDER BY NbParameters DESC

     

    #Methods: 219 234    Average #Parameters: 0.98     Std Dev: 1.32

     

    Here also the overall quality is pretty good, but here also it is easy to find terrific values:

    SELECT TOP 5 METHODS WHERE IsPublic ORDER BY NbParameters DESC

     

    Full Name

    # Parameters

    MS.Internal.PtsHost.UnsafeNativeMethods.PTS+FormatLine.

    BeginInvoke(IntPtr,IntPtr,IntPtr,Int32,Int32,IntPtr,UInt32,Int32,

    Int32,Int32,Int32,Int32,Int32,Int32,Int32,Int32,Int32,Int32,IntPtr&,

    Int32&,IntPtr&,Int32&,PTS+FSFLRES&,Int32&,Int32&,Int32&,Int32&,

    Int32&,Int32&,AsyncCallback,Object)

    31

    MS.Internal.PtsHost.UnsafeNativeMethods.PTS+ReconstructLineVariant.

    BeginInvoke(IntPtr,IntPtr,IntPtr,Int32,Int32,IntPtr,Int32,UInt32,Int32,Int32

    ,Int32,Int32,Int32,Int32,Int32,Int32,Int32,Int32,Int32,IntPtr&,IntPtr&

    ,Int32&,PTS+FSFLRES&,Int32&,Int32&,Int32&,Int32&,Int32&,Int32&,

    AsyncCallback,Object)

    31

    MS.Internal.PtsHost.UnsafeNativeMethods.PTS+FormatLineForced.

    BeginInvoke(IntPtr,IntPtr,IntPtr,Int32,Int32,IntPtr,UInt32,Int32,Int32

    ,Int32,Int32,Int32,Int32,Int32,Int32,Int32,Int32,Int32,IntPtr&,Int32&

    ,IntPtr&,PTS+FSFLRES&,Int32&,Int32&,Int32&,Int32&,Int32&,

    AsyncCallback,Object)

    29

    MS.Internal.PtsHost.UnsafeNativeMethods.PTS+ReconstructLineVariant.

    Invoke(IntPtr,IntPtr,IntPtr,Int32,Int32,IntPtr,Int32,UInt32,Int32,Int32,

    Int32,Int32,Int32,Int32,Int32,Int32,Int32,Int32,Int32,IntPtr&,IntPtr&,Int32&

    ,PTS+FSFLRES&,Int32&,Int32&,Int32&,Int32&,Int32&,Int32&)

    29

    MS.Internal.PtsHost.UnsafeNativeMethods.PTS+FormatLine.Invoke(IntPtr,

    IntPtr,IntPtr,Int32,Int32,IntPtr,UInt32,Int32,Int32,Int32,Int32,Int32,Int32

    ,Int32,Int32,Int32,Int32,Int32,IntPtr&,Int32&,IntPtr&,Int32&,

    PTS+FSFLRES&,Int32&,Int32&,Int32&,Int32&,Int32&,Int32&)

    29

     

    And the following CQL query returns 568 methods!

     

    SELECT METHODS WHERE IsPublic AND NbParameters > 8

     

    Similarly, you can also measure the quality from the number of internal variables used by a method and we obtain some similar results:

     

    SELECT METHODS WHERE IsPublic AND NbILInstructions > 0 ORDER BY NbVariables DESC

     

    #Methods: 184 149    Average #Parameters: 0.86     Std Dev: 2.14

     

    SELECT TOP 5 METHODS WHERE IsPublic ORDER BY NbVariables DESC

     

    methods

    # Variables

    java.awt.GridBagLayout.GetLayoutInfo(Container,Int32)

    105

    Microsoft.VisualBasic.CompilerServices.VBBinder.BindToMethod(

    BindingFlags,MethodBase[],Object[]&,ParameterModifier[],CultureInfo,String[],Object&)

    97

    javax.swing.plaf.basic.BasicLookAndFeel.initComponentDefaults(UIDefaults)

    64

    System.ServiceModel.Description.DispatcherBuilder.InitializeServiceHost(

    ServiceDescription,ServiceHostBase)

    62

    System.Windows.Forms.ControlPaint.DrawBorder(

    Graphics,Rectangle,Color,Int32,ButtonBorderStyle,Color,Int32,ButtonBorderStyle,Color,

    Int32,ButtonBorderStyle,Color,Int32,ButtonBorderStyle)

    61

     

     

     

    Efferent Coupling

     

    NDepend is far from being just a metric software and can help you deal with dependencies, layering and componentization, can compare 2 snapshots of your code base, can check some custom rules on your design and code and more (see the list of features here). It does support some exotic (but still useful) metrics like Efferent Coupling (Ce). The Efferent Coupling for a particular type is the number of types it directly depends on. As I explained in a previous post, the more types a given type is using, the more responsibilities it has.

     

    SELECT TYPES WHERE NbILInstructions > 0 ORDER BY TypeCe DESC

     

    #Types: 28 738    Average # of types used: 22.88     Std Dev: 20.71

     

    Here I found the average a little high, but we have to take account that primitive types such as int, string or bool are taken account in this result.

     

    And who are the monsters types?

     

    SELECT TOP 5 TYPES WHERE NbILInstructions > 0 ORDER BY TypeCe DESC

     

    Full Name

    # IL instructions

    Efferent coupling at type level (TypeCe)

    System.Windows.Markup.KnownTypes

    12593

    582

    System.Windows.Forms.Control

    18246

    332

    System.Windows.Forms.DataGridView

    67320

    331

    MS.Internal.PtsHost.UnsafeNativeMethods.PTS

    249

    264

    System.Windows.Forms.ListView

    10824

    254

     

    NDepend also support Ce on methods (and here we get the number of methods used):

     

    SELECT METHODS WHERE NbILInstructions > 0 ORDER BY