For the next NDepend version, amongst plenty of cool stuff, a new default code rule will be added. It has been named Don’t assign a field from many methods. It falls into the category of Purity – Immutability – Side-Effects rules, one of my preferred set of rules since often hard-to-find and to-fix bugs come from wrong state managements.
As its name suggests this rule matches fields assigned from multiple methods. Bugs due to corrupted state are often the consequence of fields anarchically assigned.
On many code bases tested, I found the threshold of 4 methods writer to be the best balanced number between too many false positives, and a decent amount of suspicious matches to look after at code-reviewing time. Of course this threshold is easily customizable.
Several parameters can be taken account.
Is the field static? in which case it can be pretty serious if many methods assign it.
Does the field type is a value or a reference type? In the case of a reference type, such situation might reveal a potential NullReferenceException scenario luring.
Is the field assigned outside from its class. Since another rule prevents fields to be visible from outside its class, this case has been eliminated.
How many assigner methods of the parent class are visible outside the class? How many methods calling directly or indirectly an assigner method are visible outside the class?
Can a particular class instance be accessed from multiple threads? in which case the situation sounds pretty alarming.
Not all cases are taken account since static analysis have limits and for code review warning purposes the rule algorithm must be kept easily understandable. Here is the rule code and comments that contain insight about how to generally fix such issue.
I’d be curious to read your suggestions and comments on that.
And here is what it looks like when we run the rules on the NUnit code base:
explore two folders containing older and newer versions of assemblies to compare
create on-the-fly two NDepend projets and analyze them
compare the two older/newer code base snapshots, which lead to the creation of a ICompareContext object
use this compare-context object to explore code diff and changes
The power offered by a compare-context object is barely exposed here. We only reports 3 lists of assemblies added/removed/changed, which can already be a useful information. But much more can be done with a compare-context object in hands. I wrote recently about how Code Query LINQ can be used to write rules to detect code regression issue. This post explains how code-diff can be used to detect regression, including API Breaking Changes, loss in code quality, recent code not well covered by tests… All CQLinq queries/rules exposed are based on NDepend.CodeModel API. And thus, they can all be integrated and combined into a program that starts like the one above, and then uses a compare-context object to explore code diff.
A subtle points is that when one writes a CQLinq rule involving code diff, one doesn’t deal with a compare-context object. This is an astute we use to make CQLinq rules/queries as fluent, concise and elegant as possible. Indeed, a CQLinq pre-compiling phase takes care to transform calls to extension methods, into calls involving the compare-context. Hence the following CQLinq rule that reports methods that were complex, and that became even more complex….
…is actually transformed into this expression, where the usage of the compare-context is made obvious.
Thus the logic of this query could be easily integrated into our program this way:
In the real-world, for companies that have build servers and that care for the quality of code delivered, this ability to write custom programs that can be included into the build process chain to focus on the quality of new or refactored code can certainly be a nice option.
But one can also see value in writing a custom-tooling program to be ran on the developer desktop. Such program could also use the special method TryCompareSourceWith() that can open a source file diff tool on the right location of two versions of the same source file (the source file diff tool is the one configured in NDepend > Tools > Options > Source Files Compare Tool). For example the open-source PowerToolCode Review Methods Changed uses this TryCompareSourceWith() method to show diff in code during a code review session. Certainly a more sophisticated program could help reviewing code diff, for example with the ability to assign some users to approve some diff, and persist in a DB such approval or disapproval actions.
A prominent characteristic of the software industry is that products are constantly evolving. All modern development methodologies prone that a product should evolve through small iterations. Internally, development teams are using Continuous Integration servers that shrink increment length to a minimum.
In continuously evolving code bases, chances of regressions are high. A regression occurs when a new bug is introduced inadvertently, because of a change in code. The weapon against regressions lies in automatized and repeatable tests. This is why it is advised to developers to cover their code through unit-tests: not only to make sure that written code works as intended, but also to check as often as possible that in the face of future evolutions, the code won’t be broken inadvertently.
While it is advised to write performance testing to avoid performance regressions, others non-functional requirements, such as code quality, are not checked continuously. As a direct consequence, code quality declines, nobody really cares, and this affect significantly the code maintainability. There are tooling to assess quality, there are tooling that tells the global code quality evolution, but so far there are no tooling that exposes in details code quality regression.
This is a problem I’ve been mulling on during the last years. Through the last release of NDepend, we’ve introduced a multi-purposes code ruling capability through Code Query Linq (CQLinq). A CQLinq query is a C# code query, that is querying the NDepend.API code model.
A wide range of code aspects can be queried, including facilities to query code quality, and the possibility to query the diff between two versions of a code base. A Code Quality Regression rule, typically relies on both code quality and diff areas on NDepend.API CodeModel.
Dissecting a Code Quality Regression rule
For example below is a default CQLinq code rule that matches methods that were both already complex, and became even more complex, in the sense of the popular Cyclomatic Complexity code metric, presented directly by the interface IMethod. If two snapshots of the code base are currently compared by NDepend, the extension method OlderVersion() returns for a IMethod object, the corresponding IMethod object in the older snapshot of the code base. Of course, for a method that has been added, OlderVersion() returns null, this is why in the query we first filter methods where IsPresentInBothBuilds() and even where CodeWasChanged(), since the complexity of the method won’t change if the method remains untouched.
This query can be edited live in Visual Studio and its result is immediately displayed (here we analyse 2 different versions of the code base of NUnit).
For any method, a right-click option let’s compare older and newer versions through a source diff tool (I like especially the VS 2012 diff integrated capabilities). And even better, to avoid formatting and comments irrelevant diff, and focus only on code change, a right-click option let’s compare older and newer version decompiled through RedGates .NET Reflector.
Several Code Quality Regression default Code Rules
A group of default code rules, named Code Quality Regression is proposed. They all relies on the OlderVersion() diff bridge extension method, and then focus on a code quality aspect, like the ratio of code covered by tests, various other code metrics like number of line of code, or also the fact that an immutable type became mutable (which can typically break client code).
And since CQLinq is really just the C# LINQ syntax you already know, it is easy to define a particular code aspect that shouldn’t be broken, and write a custom regression rule on it.
API Breaking Changes Code Rules
Amongst the various code aspects we, developers, don’t like to see changed, is the visibility of publicly visible code elements. For dev shops responsible for publishing an API, this issue is know as API Breaking Changes and there are some default CQLinq rules to detect that.
The API Breaking Changes rules source code abide by the same idea. Here we are seeking for types in the older version of the code base, that are publicly visible, and that:
have been removed in the newer version of the code base (while their parent assembly is still present)
or where their visibility is not publicly visible anymore.
Some facilities to show the result to the user are proposed. For example, when chasing public interfaces broken (i.e with methods added or removed) the query select clause can be refined, to let the user list the culprit added/removed methods:
Typically, Code Quality tools are daunting because for any legacy real-world code base, they list thousands of issues . It is neither realist nor productive, to freeze the development for weeks or months to fix all reported flaws.
The ability to focus only on new quality regressions, that appeared recently during the current iteration, comes with the advantage to restraint the number of issues to fix to an acceptable range. This also favors the relevancy of issues to fix, since it is more productive to fix issues on recent code before it is released to production, than to fix issues on stable code, untouched for a long time.
In my consultant career, no matter the kind of company I visited, from the tiny startup to the largest fortune 500 corporation, they all have in common to be entangled in spaghetti. Spaghetti means poorly structured code. Spaghetti means high maintenance and evolution cost. Spaghetti means frustration, friction and lack of motivation for everyone in the team. And often, spaghetti means project failure.
Mastering code structure means demystifying and getting rid of spaghetti code! With Filip Ekberg, we propose you to cover in-depth code dependencies management through two 30 minutes screencasts. The tool NDepend is used to shed light on the structure of real-world code bases.
The first screencast shows how to make the best of dependency graph and dependency matrix tooling to explore the structure of a .NET application. This includes detecting at a glance good patterns or smelly patterns, buried in the code. In this screencast, we also explain how to master interaction between code query LINQ generation and dependency graph and matrix. Doing so constitutes a very powerful way to generate all custom graph you really need to visualize, like custom call graphs and inheritance graphs.
The second screencast starts with focusing on dependency code rules. This is especially useful to write rules related to Object-Oriented tenets, like base class shouldn’t use derivatives, or to list instance method that could be made static. We then continue with rules concerned with layered architecture, like UI layer shouldn’t use the DAL layer or components dependency cycles detection. We finally explore entangled layering in the real-world, and explain how to get rid of spaghetti and achieve well structured code bases.
These screencats contain many original and useful haha hints that will improve the way you and your team manage your .NET code base dependencies.
These days we are restructuring the NDepend code base to make it more suited to welcome future features implementation. Here is below the new architecture of the NDepend.UI assembly, made of around 50.000 lines of code, shown through a Dependency Structure Matrix:
As always, the dependency matrix naturally exhibits architectural patterns. Keep in mind that a blue cell means: the namespace in column is using the namespace in row. The number on a blue cell represents the number of members used (see the option in the matrix menu: Weight on Cells set to Direct: # members). This cell number is then a measure of the coupling between the two namespaces.
Here are architecture patterns emerging from this dependency matrix:
There is one Top namespace, that is using all namespaces (blue column #0)
There is one Base namespace, that is used by all namespaces (blue row #17)
The Panels namespaces don’t use each others (empty big red square, rows/columns [1-12])
The Shared namespaces don’t use each others (empty small red square, rows/columns [13-16])
The blue cells in the rectangle rows [13,16] x columns [1,12], represents the usage of shared features by the panels. For example, on row #15 we can see that the DependencyContextMenu feature is used by the Top, Graph and Matrix impl.
I found this NDepend.UI architecture pretty neat. Whether we will add a new panel or a new shared feature in the future, it will be inserted naturally in this structure. And this architecture is simple enough to get validated at a glance. But at a glance is not enough. What about defining custom rules that take care of validating this architecture and inform us automatically of any violation?
Each of this architecture pattern could be actually validated through a Code Query LINQ rule. Let’s go through 5 steps to write the rule Panels shouldn’t use each others through a LINQ query.
5 steps to write the rule: Panels shouldn’t use each others
Step 1: First, let’s match all panels namespaces. Here we are using the extension method WithNameWildcardMatch() to match the panels namespaces by name. We can see that some panels (Matrix, QueryEdit, Search…) are made of several sub-namespaces. The rule must be smart enough to both:
let sub-namespaces of a panel use each others,
forbid sub-namespaces of different panels use each others:
Step 2:Now let’s use the method IUser.IsUsing() to list the couples of [namespace user, namespace used]. Notice that to improve the query performance:
we are restraining the set of namespaces user only to the panels namespaces that are indeed using another panel namespace (with the extension method UsingAny() )
we are restraining the set of namespaces used only to the panels namespaces that are indeed used by another panel namespace (with the extension method UsedByAny() )
And indeed the query runs fast: 5 milli-seconds on a more than 130.000 lines of code code base! There are 45 couples of [namespace user, namespace used] listed:
Step 3:Now, we need to validate for each couple of [namespace user, namespace used], that both user/used namespaces belong to the same panel. For that we have to work on the namespace name string. Let’s first get rid of the prefix “NDepend.UI.Panels.” and obtain for each namespace what we call its sub-name:
Step 4: Let’s get the panel name from the namespace sub-name. For that we are using the ternary operator: if the namespace sub-name doens’t contain any dot, it is already the panel name. Else, the panel name is the sub-string until the first dot met in the sub-name:
Step 5:Et voila! We now just have to make sure that for each couple of [namespace user, namespace used] they have the same panel name! This is as simple as adding the where clause: where nUserPanel != nUsedPanel.
We then add the prefix warnif count > 0 to transform our code query into a code rule. Plus, we set the name of the rule to Panels shouldn’t use each others. And the rule is ready to be saved, and to be run continuously in Visual Studio and/or on the build machine.
Note that this rule gets compiled and executed in only 7 milli-seconds (see 7 ms on the screenshot). This means that a typical machine can run in one second more than 140 of such complex rules against a large code base!
And also, when new panels will be developed, this rule won’t have to be updated. It’ll just work!
Could it be easier or faster to validate the architecture of a big lump of code?
Btw, just a quick side remark: The fact that panels implementations are not using each other doesn’t mean that panel shouldn’t interact with each other. They actually do interact a lot with each others. But they do interact through using a mediator implementation that lives in the Base namespace. This mediator then calls the Top implementation (through inversion of control) that takes care of dispatching any interaction to the right panel.