Screencast: Inspecting Code Quality and Code Complexity

Recently I had a chance to record a screencast on Inspecting Code Quality and Code Complexity with Filip Ekberg. In this half-hour video we inspect the NUnit v2.5.8 code base quality and complexity with NDepend v4.0:

  • We first use the popular Cyclomatic Complexity code metric…
  • …we then refine our findings by focusing on code quality regression since NUnit v2.5.3.
  • …some more code metrics are then involved, including the code coverage ratio by tests, the C.R.A.P metric and the code coupling ratio.
We plan to do more screencast on other programming related topics. Your feedbacks and suggestions are welcomed.
Enjoy!

Posted in CC, Code Diff, Code metrics, Code Rule, Cyclomatic complexity | 1 Comment

Beware mixing List.Sort() and Thread.Abort()

After more than a decade of full-time development with .NET, I still discover surprising unfixed bugs in the heart of the framework. Today, while I was smoke-testing new stuff in the NDepend UI, I stumbled on an InvalidOperationException thrown by a call to List<T>.Sort() ?!

After 15 minutes of searching the cause of the bug, I found this unfixed bug on Microsoft Connect: List.Sort() throws InvalidOperationException (instead of re-throw ThreadAbortException) while thread is being aborted

In other words, if your application is using Thread.Abort() somehow, and that the abortable code is calling List<T>.Sort(), you are exposed to the risk of getting an InvalidOperationException instead of a ThreadAbortException.

I already hear that it is a good practice to avoid calling Thread.Abort(), but the fact is that if one wants a responsive UI, one needs some sort of abortable/timed-out background work, and at a point Thread.Abort() will have to be called for that. As long as calling Thread.Abort() is done in a safe way (see this Joe Duffy detailled post) we didn’t meet any problems so far. On millions of production runs logged, we cannot see any problem except … that List<T>.Sort() throws an InvalidOperationException with the frequency of around one on every 50K production runs. And today, this bug just popped to me in a reproductible way thanks to some refactoring I was doing.

To workaround this problem once for all, I enclosed all calls to any override of List<T>.Sort() in some extension methods in a dedicated class. These BugFreeSort() methods catch the InvalidOperationException. This is fine because anyway the thread is in a ThreadState.AbortRequested and the CLR will throw the ThreadAbortException at the end of the catch block. Also, attached to this class through an attribute, I declare a CQLinq code rule in the source code to make sure that all calls to List<T>.Sort() are made through one of these BugFreeSort() methods. Here is the code:

Here is the code rule declared in the code. Hopefully any .NET developer acquainted with the LINQ syntax, should be able to read this piece to code:

And here is a screenshot of the NDepend methods violating the code rule before refactoring:

We have almost 200 code rules declared in source code like this newly added one. They protect us from stumbling again and again in tricky identified scenarios. In other words with these rules we abide by the tenet: Fool me once, don’t fool me twice.

In a team context, doing so is especially useful since such rule is warning a developer when he’s doing something wrong. Such warning is actually a just-in-time automatic piece of education for newcomers developers, about how things must be done. Communication in the team is enhanced, the overall friction is decreased, and I found this pretty cool :)

Posted in .NET Framework, .NET Fx, C#, Code Query, Code Rule, CQLinq | 6 Comments

The consequence of a.NET Framework API design flaw

Last week, a user reported a bug on NDepend (hopefully it happens in a very rare situation!). After mulling over a bit on the bug stack trace, it appears that the bug is the consequence of a call to ICollection<T>.Add() while the underlying collection is an array!

Here I felt bad, because really the bug is a consequence of a .NET Framework API design flaw. Or better said, to take our responsibility, the bug is the consequence of us, not providing a work around seriously enough on this  .NET Framework API design flaw. Why the hell some read-only collection implements the Add() method? It is even getting worse if considering the documentation of Array.Add() (as a IList<T> explicit implementation method) and ReadOnlyCollection.ICollection<T>.Add().

 This implementation always throws NotSupportedExceptionOUCH!

It appears that finally, a decade after .NET 1.0 has been release, 3 interfaces IReadOnlyCollection<T>IReadOnlyList<T>IReadOnlyDictionary<T> will be presented by the .NET v4.5. As commonly said, it is never too late to do things right, but in the case of an API with ascendant compatibility used by millions of developers world-wide, this proverb doesn’t fit that well.

The fact is that the brand new NDepend.API actually uses the interface ICollection<T> to return read-only lists. I wouldn’t like us or some users stumble on this API design flaw again! Hence a decision was taken to change the API policy, even though this will introduce breaking-change (the API is still young enough for that). First, I wrote the following CQLinq query to evaluate the problem:

Here is the result, 40 methods (or property getters) returns something that implement IEnumerable<T> and many of them returns a ICollection<T>.

So what the API should return instead of ICollection<T>? Knowing that NDepend is still compatible with VS2008 and hence is compiled with .NET 3.5, it cannot use the brand new .NET v4.5 IReadOnlyList<T> (but it will certainly use these in several years, when we will compile against .NET v4.5 and above).

  • Returning arrays instead of ICollection<T> could be an idea. The problem is that array is an implementation, and it is not read-only since the user can update each slot of the array.
  • I never really liked the class ReadOnlyCollection<T> that wraps a IList<T>. First it is a class and not an interface. Second it has the property getter Items that can returns the wrapped IList<T>, so it is not that read-only and not a good wrapper at all! Third it implements IList<T>, and all mutable members are not supported, so it can lead to the same kind of confusion we are now faced with.
  • Returning some IEnumerable<T> would be a better choice than the two above. It is not ideal through. While IEnumerable<T> conveys well the read-only idea, it has no Count nor an item access by index property. Hence it is not really suited to return a well-sized list. LINQ provides the adequate extension methods Count() and ElementAt(), but  IEnumerable<T> has a strong taste of sequence, while we’d really like to provide users with proper lists!

So we choose to provide our own interfaces IReadOnlyCollection<T> and IReadOnlyList<T>! Same as the ones released soon in .NET v4.5, but defined in another namespace NDepend.Helpers, this will discard most naming collisions for clients. Thanks to this choice, the API is as clean as possible for now, and the transition will be simplified once NDepend will be compiled for .NET v4.5 and above, sooner or later.

As more and more often, I found inspiration in a stackoverflow.com response here. Find below the whole interfaces declarations and implementations. This is not an ideal solution, since List<T> and Array of course doesn’t implement our interfaces, but I found it to be a good design compromise.

Notice the four factory extension methods, that make a clear distinction at creation time, between Wrapped and Cloned read-only collection. Notice as well that the Cloned factories, take any sequence as argument, while the Wrapped factories, need collections and lists as argument:

  • ToReadOnlyWrappedCollection(this ICollection<T>), 
  • ToReadOnlyClonedCollection(this IEnumerable<T>),
  • ToReadOnlyWrappedList(this IList<T>), 
  • ToReadOnlyClonedList(this IEnumerable<T>),

Also, we’ve added the IndexOf() extension method over our two interfaces, because we needed it, and because the .NET Framework doesn’t provide IndexOf(this IEnumerable<T>) because a sequence is not necessarily ordered (like hashset or dictionary).

And finally, I did a CQLinq rule to check that NDepend.API will never return anymore a ICollection<T>!

Posted in .NET Framework, API usage, breaking changes, Code Rule | 11 Comments

Include IL Offset into production exception StackTrace

We are currently doing an intensive pesky minor bugs session. Bugs are logged into our gmail account as a production bug tracking server. For each logged crash, we get a stack trace and plenty of information about the machine environment.

One recurrent problem with exception stack trace in .NET, is that, one needs to release PDB files with assemblies installed in production code. Without PDB files, you still gets the stack trace of methods called, but you loose the exact source file line from where the exception popups. The problem is that usually the PDBs files weight between one and two times the weight of the target assemblies. Hence, releasing PDBs is usually not an option because this consumes so much more memory (both install hard-drive memory, and process memory at run-time).

The exact source file line from where the exception popups is not just a convenient info, it is an essential info. The typical wrong situation is when a method throws a NullReferenceException, but contains several references that can potentially be null: in such case you are stuck because you are missing the information from where exactly the exception popups. Knowing the exact source file line from where the exception popups, it would become easy to identify the faulty null reference.

To fix this situation, Mike Stall, the MSFT debugging guru, wrote a blog post about Converting a managed PDB into a XML file. This blog post shows a trick about how to use some tooling to both, not release the PDB files, and be able to retrieve the exact source code line from a production exception stacktrace. The problem is that often real-world application assemblies are obfuscated. The correspondence between released obfuscated assemblies and their initial PDBs is then completely broken.

Yesterday, I just realized that somehow, it should be possible to retrieve the IL offset in the obfuscated method body, that provokes the production exception. After googling a bit, it seems that sometime stacktrace contains the IL offset, at the end of each frame, after a plus ‘+’ character. However I haven’t been able to determine what should be done to get this extra information.

Hopefully, as always, more googling lead to exactly what I was looking for: in the step 3 of this blog post Getting file and line numbers without deploying the PDB files Tim Stall (another Stall debug guru?) shows how to retrieve all stack trace and IL offset programatically. Using this info, I’ve been able to rebuild completely the stack trace, including IL Offset formatted the same way as Reflector does (hexadecimal with 4 digits prefixed with L_).

Now we get clean stack trace with IL offset. It is pretty straightforward  to infer a source code line from a IL instructions. You just have to know that if the Nth IL instructions  throw the exception, you’ll get the offset of the N-1th IL instruction logged.

Notice that we use Preemptive Dotfuscator for obfuscation, and this tool come with Lucidator to transform a stacktrace made of obfuscated methods, into a stack trace with original methods name.

Find below our source code + associated tests. This code implements another requirement: we remove all localization info from the stacktrace. This way we are able to build a hash code from a stacktrace. Such hash code is pretty useful to group similar crash logs, independently from the underlying Windows machines localization settings.

…and the test code that covers 100% StackTraceHelper:

Posted in Uncategorized | 3 Comments

An Original Algorithm to Find .NET Code Duplicate

The upcoming Visual Studio 2012 version comes with tooling to find code duplicate.

However, today, I’d like to talk about another code duplicate tool for .NET code. This tool comes as a NDepend Power-Tool. Power-Tools are a set of open-source tools based on NDepend.API. The source code of Power-Tools can be found in $NDependInstallPath$\ NDepend.PowerTools.SourceCode\ NDepend.PowerTools.sln.

The algorithm underlying this code duplicate Power-Tool is simple, yet powerful in practice. It consists in defining sets of methods that are using the same members, i.e calling the same methods, reading the same fields, writing the same fields. We call these sets, suspect-sets. Suspect-sets are sorted by the number of same members used.

Let’s run this Power-Tool on the NUnit v2.6.0 code base. The screenshot below shows the 3 steps of the algorithm executed, and the first suspect-set. This first suspect-set is, as often from my testing experience, the entry points of the application. Very often an application has several entry points that does almost the same initialization things. Sometime it is pure duplicate, sometime it is slightly changed. Most of the time this initialization code can be factorized in one or several parametrized method.

Notice the option o for Open callers methods decls. The declarations are opened in Visual Studio actually.

The second suspect-set is the following one…

bingo, we have an exact code duplicate!

The third suspect-set is the following one…

…by looking at the code below, this suspect-set can eventually be considered as a false positive. Personally, I still consider such code as code duplicated. These two methods implement a unique layout algorithm, that could be parametrized. So here I’d vote for a factorization.

I could continue on and on, but I invite you to download the 14-days full featured evaluation of NDepend to see what this algorithm will report on your own code.

Hopefully duplicate results found by this algorithm are pretty relevant. Its main advantage over others algorithms is that it is not disturbed by dirty copy-paste where the pasted code is then sightly modified. Another advantage is that this algorithm can be run on IL code only, source code is not required. Notice that what this current blog post doesn’t show, is that a suspect-set can have more than 2 suspect methods.

On the cons side, sometime a suspect-set can be humanly considered as a false positive. At least it is almost always possible to refactor found duplicates to factorize them into one or several parametrized methods (as explained for the 2 layout methods). The fact is that two or several methods don’t use the same set of members by coincidence.

Also this algorithm is very fast to run, from 10 to 100 times faster than the VS2012 one. Concretely it takes few seconds to be executed on a real-world large code base. This speed is the consequence of the fact that NDepend.API is optimized to browse dependencies very quickly.

Concerning the algorithm itself, it is open-source so if you are interested in browsing and tweaking it you can. This open-source + NDepend.API form makes it particularly suited to be integrated into a CI process.

The algorithm consists in 3 steps:

  1. Investigate each method (including third-party ones) to see if their callers could be considered as suspect (methods called very often like the ones from types in System.Collection.Generics or System.Xml are discarded to limit false positive).
  2. Merge suspect sets obtained from step 1).
  3. Sort suspect-sets from a weigh computed on number of members called.

Hopefully this algorithm can be optimized further, and your feedback is welcomed. Enjoy!

Posted in Uncategorized | 10 Comments