Normal
0
21
false
false
false
FR
X-NONE
X-NONE
During my interview on Code Metrics by
Scott Hanselman’s on Software Metrics, Scott had a particularly relevant
remark.
Basically, while I was
explaining that long and complex methods are killing quality and should be
split into smaller methods, Scott asked me:
looking
at this big too complicated method and I break it up into smaller methods, the complexity of the business problem is
still there, looking at my application I can say, this is no longer complex
from the method perspective, but the software itself, the way it is coupled
with other bits of code, may indicate other problem…
Software complexity is a
subjective measure relative to the human cognition capacity. Something is
complex when it requires effort to be understood by a human. The fact is that
software complexity is a 2 dimensional measure. To understand a piece of code
one must understand both:
- what this piece of
code is supposed to do at run-time, the behavior of the code, this is the business problem complexity
- how the actual implementation
does achieve the business problem, what was the developer mental state while she wrote the code, this
is the implementation complexity.
Business problem
complexity lies into the specification of the program and reducing it means
working on the behavior of the code itself. On the other hand, we are talking
of fabricated complexity when it
comes to the complexity of the implementation: it is fabricated in the sense
that it can be reduced without altering the behavior of the code. As an
illustration here is a super/giant/complex method found inside the .Net
Framework implementation System.Windows.Forms.DataGridView.GetClipboardContent():
GetClipboardContent() is made of around 300 Lines of
Code
and has a ILComplexity
equals to 192. GetClipboardContent() does
not have drastic performance requirement. As a consequence I don’t see any
justification for not refactoring this massive method into smaller ones and
maybe even a small classes hierarchy that could help in implementing the enormous
switch/cases. Doing so would certainly discard a lot of fabricated complexity.
Fighting Fabricated Complexity with Simple Code Metrics
The simplest way to limit
fabricated complexity is to abide by simple code metrics
thresholds. This is why one of the default CQL rule
proposed by NDepend is
the following one:
// <Name>Quick summary of methods to
refactor</Name>
WARN IF Count > 0 IN SELECT TOP 10 METHODS WHERE
( NbLinesOfCode > 30 OR
// http://www.ndepend.com/Metrics.aspx#NbLinesOfCode
NbILInstructions > 200 OR
//
http://www.ndepend.com/Metrics.aspx#NbILInstructions
CyclomaticComplexity > 20 OR
// http://www.ndepend.com/Metrics.aspx#CC
ILCyclomaticComplexity > 50 OR
// http://www.ndepend.com/Metrics.aspx#ILCC
ILNestingDepth > 4 OR
//
http://www.ndepend.com/Metrics.aspx#ILNestingDepth
NbParameters > 5 OR
// http://www.ndepend.com/Metrics.aspx#NbParameters
NbVariables > 8 OR
// http://www.ndepend.com/Metrics.aspx#NbVariables
NbOverloads > 6 )
//
http://www.ndepend.com/Metrics.aspx#NbOverloads
Fighting Fabricated Complexity with
Abstractions
Another popular way to
limit fabricated complexity is to hide implementations behind some interfaces.
Even though interfaces are not contracts,
an interface itself often conveys enough information to make its purpose
understandable. For example the IDisposable pattern is a complex topic but still, the IDisposable interface
present only one method and we at least understand that it is an
indication that some unmanaged resources must be released somehow. When
a piece of code relies on an interface, at code review
time, the interface relieves the developer from the burden of mastering
every
details of the implementation(s) hidden by the interface. This is in
essence the Liskov Subsitution
Principle.
NDepend provides 2 ways to
asses if and where your program should use more abstractions. First NDepend
proposes the Robert C Martin metrics about Abstractness vs Instability.
The idea is that the more a code element
of a program is popular, the more it should be abstract. Or in other words,
avoid depending too much directly on implementations,
depend on abstractions instead. By popular code element I mean an assembly
(but the idea works also for namespaces and types) that is massively used by
other assemblies of the program. Abstractness of a group of types (like an
assembly) is the ratio NbAbstractTypes / NbTotalTypes. There are debates about
how these metrics should be computed and we plan in the future to make Robert C
Martin's metrics more flexible but still, you get the point, it is not a good
idea to have concrete types very popular in your code base. This provokes some
Zones of Pains in your program, where changing the implementations can
potentially affect a large portion of the program. And implementations are
known to evolve more often than abstractions.
The second way to assess
if more abstractions is needed is to rely on the Level metric.
I won’t detail here this metric and its usage because I have already done it in
the post Layering, the Level metric and the Discourse of Method.
The idea here is that using more interfaces decrease the overall Level value of
code elements (classes/namespaces). Thus, if many classes and namespaces have a
high Level value (> 12), it means that you have a long stack of concrete
layers sitting above each others. Introducing abstractions is then a good idea
to split such long stack and benefit from the interface simplification describe
above.
Fighting Fabricated Complexity with
Immutability
A common
source of fabricated complexity is mutable states. The human brain is not
properly wired to anticipate what is really happening at run-time in a program.
While reviewing a class, it is hard to imagine how many instances will
simultaneously exists at runtime and how the states of each these instances
will evolve over time. This is actually THE major source of problems when
dealing with a multi-threaded program. If a class is immutable (meaning if the
states of all its instances objects don’t change at runtime once the
constructor is called) its runtime behavior immediately becomes much easier to
grasp and as a bonus, one doesn’t need to synchronize access to immutable
objects. For more information, I wrote about the benefits of immutable types
and how NDepend can help in statically verifying immutability.
Fighting Fabricated Complexity with minimal
Coupling
When trying
to re-engineer/understand/refactor a large piece of code (I mean something made
of dozens of classes like a big namespaces or an assembly), the difficulty is
directly proportional to the coupling between considered code elements. Both
these following graphs are made of dependencies between 23 classes, one with 53
edges and the other one with 175 edges: which one would you prefer to deal
with?

While clear
componentization is certainly the best way to fight against entangled/spaghetti code,
keep in mind that using abstractions is also a good way to limit the
over coupling-overhead. Indeed, if an interface has N implementations, then relying
only on the interface is virtually like depending on the N underlying classes,
except that from the static dependency
point of view, you actually rely on only one type.
Could Fabricated Complexity be Measured?
There are plenty of other
sources of fabricated complexity and I estimate that the 4 quoted ones are
certainly the big 4 culprits. One could arguably appends also the ratio of code
coverage + the number of automatic tests in the list, since a clean tests suite
certainly forces the code to be better designed, simpler and more maintainable.
Interestingly enough, all
these potential sources of problem can be controlled through NDepend. Could we
find a formula that might process all these data to finally spit a number (a
score) to measure the Fabricated Complexity? This approach is implemented in
tools like Struture101
in the Java world that comes with a dedicated XS metric
to measure fabricated complexity. Another example is the maintainability index
range in VisualStudio
that spits a number in the range 0-100. This number is linearly computed from
potential source of fabricated complexity such as NbLinesOfCode, CyclomaticComplexity…
and being lower than 20 is bad.
Such metrics are
certainly interesting but I have a few caveats. The inventors of these formulas
claim that in a certain range of value the code is crappy. The fact is that
the value spitted doesn’t have any associated dimension. If a method has 120
lines of code and a Cyclomatic Complexity of 50, does it helps to add 120+50
and says that 170 something
(something here is undetermined) is a bad thing? From the 2 values (120 Loc and
50 CC), isn’t it already obvious that the method should be splitted? And what
about a method with 169 lines of code and no cyclomatic complexity (such as the
massive Windows Form InitializeComponent()
method)? I don’t think that such a method is as bad as the previous one and
still they both measure 170 something.
Let's take for example the more and more popular CRAP
metric
that helps to detect crap code. The idea is that crap code is complex methods and poorly covered by tests.The proposed formula is
CRAP(m) = comp(m)^2 * (1 – cov(m)/100)^3 + comp(m) where
comp(m) is the cyclomatic complexity of method m, and cov(m) is the test code
coverage provided by automated tests.
By mixing it up, it is hard
to understand the meaning of an estimation of the fabricated complexity value
and worst, it makes harder to predict which refactoring will affect
positively the value of the metric. This is the reason why so far in NDepend,
we favored a more linearly independent/vectorial way to assess the exact causes of
problems. To detect complex methods poorly covered with NDepend one just needs the following CQL rule and adjust thresholds at whim:
Normal
0
21
false
false
false
FR
X-NONE
X-NONE
MicrosoftInternetExplorer4
WARN IF Count > 0 IN SELECT METHODS WHERE CyclomaticComplexity > 10 AND PercentageCoverage < 90 ORDER BY NbLinesOfCode DESC
The CQL result panel provides all needed information to work with the crap methods detected and lets jump directly into VisualStudio to view and edit culprit methods:

Alternatively,
the CQL flexibility lets correlate these metrics with other kind of
relevant information, like for example detecting crap methods added recently to
the code base (or also crap methods where ChangesObjectState, ChangesTypeState, CodeWasChanged, is declared inside or outside certains assemblies/nalmespaces/classes, not IsGeneratedByCompiler, is used by a certain code element, is using certain code elements, BecameObsolete...etc):
Normal
0
21
false
false
false
FR
X-NONE
X-NONE
MicrosoftInternetExplorer4
WARN IF Count > 0 IN SELECT METHODS WHERE WasAdded AND CyclomaticComplexity > 10 AND PercentageCoverage < 90 ORDER BY NbLinesOfCode DESC
Also one can do experiments and compose metrics at whim:
Normal
0
21
false
false
false
FR
X-NONE
X-NONE
This situation might
evolve in the future. But so far I would prefer to avoid giving the
NDepend user the feeling that fighting fabricated complexity is only a matter of
reducing the value of a naked index (i.e an index without any dimension). Things must remain close to the code itself: if
I have a method with 15 parameters it seems much more meaningful than having
a class with a maintainability index equals to 18 or a CRAP value of 33.21.
Are you using these
formulas? What’s your opinion on this?
Posted
Sun, Jun 28 2009 12:12 PM
by
Patrick Smacchia