Code metrics on Coupling, Dead Code, Design flaws and Re-engineering

Share/Bookmark

There is a whole range of interesting code
metrics relative to coupling. The simplest ones are named Afferent Coupling (Ca)
and Efferent Coupling (Ce). Basically, the Ca for a code
element is the number of code elements that use it and the Ce is the number of
code elements that it uses.

You can define Ca and Ce for the graph of
assemblies dependencies, the graph of namespaces
dependencies, the graph of types dependencies and the graph of
methods
dependencies of a code base. You can also define the Ca metric on the fields of a
program as the number of methods that access the field. This leads to 9 metrics
all supported by the tool NDepend.
We precise that when computing Ce, NDepend takes account of code elements
defined in tier code such as the code of the .NET framework.

 

With NDepend, if you wish to know which methods
of your program are massively used you can write the following CQL Query
:

 

SELECT TOP 10 METHODS ORDER BY MethodCa DESC

 

Being used a lot is not necessarily a problem.
However it is still interesting to know which part of your code base is used a
lot. For example, if we apply the CQL query below to the core of the .NET
framework (i.e mscorlib, System, System.Core, System.Xml…) we obtain the
following 10 methods:

 

methods

Afferent
coupling at method level (MethodCa)

System.Environment.GetResourceString(String)

2632

System.Object..ctor()

2416

System.ArgumentNullException..ctor(String)

2368

System.String.get_Length()

1795

System.Type.GetTypeFromHandle(RuntimeTypeHandle)

1762

System.IDisposable.Dispose()

1119

System.SR.GetString(String)

1115

System.InvalidOperationException..ctor(String)

1101

System.Object.ToString()

1056

System.ArgumentException..ctor(String)

1010

 

 

High Efferent Coupling and
design flaws

 

If you wish to know which types of your program
are heavy users of other types you just have to write:

 

SELECT TOP 10 TYPES ORDER BY TypeCe DESC

 

High Ce might reveal a design problem. Types
with high Ce are entangled with many other implementations. The higher the Ce,
the higher the number of responsibilities the type has. If we apply the CQL query
above to the core of the .NET framework we obtain the following list:

 

types

Efferent
coupling at type level (TypeCe)

Microsoft.CSharp.CSharpCodeGenerator

172

Microsoft.VisualBasic.VBCodeGenerator

161

System.Net.HttpWebRequest

138

System.Net.Sockets.Socket

137

System.AppDomain

131

System.RuntimeType

128

System.Xml.Xsl.XsltOld.Compiler

125

System.Xml.Xsl.Xslt.QilGenerator

124

System.Xml.Serialization.XmlSchemaImporter

120

System.Diagnostics.Process

116

 

As expected this list contains high level
classes such as AppDomain, Process or Socket. This sort of classes with high Ce
are needed to implement complex concepts that spawn numerous concerns. For
example, by selecting the 131 types used by the AppDomain class I can tell that
AppDomain is concerned by assembly security (Code Access Security, strong
naming…), Windows users and file security, OS environment info, .NET Remoting, .NET
Reflection, Threading and globalization.

 

Could the AppDomain class be split into several
smaller classes? I guess no because this is such an essential class. But this
is the exception. Generally, classes with high Ce would be more like the
CSharpCodeGenerator class which represent alone a component. I can see that the
class CSharpCodeGenerator deals with a lot of implementation detail of the C#
language such as exception, type casting, all sorts of members (method, field,
event, property…), comment, indentation etc. This sounds good as long as each
details has its own implementation the class CSharpCodeGenerator acts as a high level mediator. But if I dig
further and decompile some methods I can see that the class CSharpCodeGenerator
also cops with file management and contains a lot of logic (>1000 Lines of
Code) to handle some CodeDOM details. This is likely an indication that this
code could have been better designed with the help of several smaller collaborating
classes.

 

Afferent Coupling and Dead
Code

 

Ce values are meaningful values
when it comes to assessing some design and when you have to re-engineer code. Ca values
are also useful, especially when equal to 0. A Ca value equals to 0 indicates a
potential dead code element. A dead code element is an element that can be
discarded because it is not used by the program anymore. Pruning dead code is a
necessary task to make sure that your code is rationalized. Here also the tool NDepend
can help you because it knows about Ca. However things get more complicated
here because there are numerous cases where a zero Ca doesn’t mean dead code. For
example, entry points (i.e Main methods), class constructors or finalizers
represent some methods that will always have a zero Ca. However these methods
are not dead code because the CLR will call them at runtime.

 

Here is the CQL rule that we provide by
default to detect dead methods:

 

// <Name>Potentially unused
methods</Name>

WARN IF Count > 0 IN SELECT TOP 10 METHODS WHERE

 MethodCa == 0 AND            // Ca=0 -> No Afferent Coupling -> The method 

                              // is
not used
in the context of this

                              // application.

 

 !IsPublic AND                // Public methods might be used by
client 

                              // applications of your assemblies.

 

 !IsEntryPoint AND            // Main() method is not used
by-design.

 

 !IsExplicitInterfaceImpl AND // The IL code never explicitely calls 

                              // explicit interface methods
implementation.

 !IsClassConstructor AND      // The IL code never explicitely
calls class

                              // constructors.

 

 !IsFinalizer                 // The IL code never explicitely
calls

                              // finalizers.

 

Notice how we consider that public methods
should be not considered as dead code in the general case. This rule generally matches a lot of false positive because when statically analyzing the
IL code, we can see that often overridden implementations are not statically
linked. Hence to get a first evaluation of dead code it is worth adding the restricting
condition
AND !IsVirtual.
This particular issue will be addressed by further versions of NDepend.

 

Things gets more easy and efficient when it comes to detect
dead fields and dead types. Here are the 2 CQL rules we propose by
default and their particular conditions to avoid false positive:

 

// <Name>Potentially unused
fields</Name>

WARN IF Count > 0 IN SELECT TOP 10 FIELDS WHERE

 FieldCa == 0 AND  // Ca=0 -> No Afferent Coupling -> The field is
not used

                   // in the context of this application.

 

 !IsPublic AND     // Although not recommended,
public fields might be used

                   // by client applications of your assemblies.

 

 !IsLiteral AND    // The IL code never explicitely
uses literal fields.

 

 !IsEnumValue AND  // The IL code never explicitely
uses enumeration value.

 

 !NameIs “value__” // Field named ‘value__’ are relative to
enumerations 

                   // and the IL code never explicitely uses them.

 

 

 

// <Name>Potentially unused
types</Name>

WARN IF Count > 0 IN SELECT TOP 10 TYPES WHERE

 TypeCa == 0 AND     // Ca=0 -> No Afferent Coupling -> The type is
not 

                     // used in the context of this application.

 

 !IsPublic AND       // Public types might be used by
client 

                     // applications of your assemblies.

 

 !NameIs “Program”  
//
Generally, types named Program contain a Main() 

                     // entry-point method and this
condition avoid 

                     // to consider such type as unused code.

 

Notice how easy it is to customize these
rules thanks to CQL facilities such as NameIs, NameLike, SELECT OUT
OF/FROM
etc…

 

 

Ranking
Metrics

 

Since with NDepend, we have an efficient in-memory
representation of internal dependencies of a code base, we got the idea of
implementing the famous Google Page Rank
algorithm to the graph of methods and the graph of types. As a consequence, the
2 metrics TypeRank and MethodRank indicate which types and which methods of a
code base are the most important. As
a web page with Google, a code element is considered as important if it is used
by numerous code elements that themselves are considered more or less
important.

 

When discovering a code base knowing which types and methods are
important is -well- important, because they likely represent the cornerstone of
the code base, the ones that you will have to understand first in order to
dig into the program structure. If you have the chance to be educated on the code base by
one of its developer, she will likely talk to you first about these important
code elements in order to give you the basics. What is cool is that you can
know automatically and objectively about this information thanks to the ranking
metrics.

 

For example, suppose if you were a complete beginner in .NET
programming, what would be the top 10 most important types to know about?
Integer? String? Object? Bool? Let’s see what the TypeRank metric has to say about
the top 10 most important types of the .NET framework:

 

SELECT TOP 10 TYPES ORDER BY TypeRank DESC

 

types

Type
Rank

System.Runtime.InteropServices.ComVisibleAttribute

409.03

System.Object

380.89

System.Runtime.InteropServices.ClassInterfaceAttribute

329.73

System.Void

281.02

System.CLSCompliantAttribute

189.13

System.Int32

170.17

System.Boolean

168.88

System.Runtime.InteropServices.GuidAttribute

153.64

System.String

145.63

System.Runtime.InteropServices.InterfaceTypeAttribute

143.32

 

Hopefully we find the types that we considered
important, but we also get some interesting finds that show how the COM and
interop things such as ComVisibleAttribute or GuidAttribute are so pervasive
inside the .NET framework MS implementation.

 

We also precise that code that is considered
important deserves even more attention than the rest in terms of test code
coverage and design.

 

NDepend supports also several others code
metrics relative to coupling such the Association Between Classes,
the Lack Of Cohesion of Methods (LCOM) and the Robert C.Martin metrics on
assemblies.
I will certainly write some thoughts on all these on future posts. Meantime you
can read their definition and try them on your own code base. You can also have
a glance and print the great NDepend Poster Metrics done by Stuart Celarier,
Patrick Cauldwell and Scott Hanselman .

 

Share/Bookmark

This entry was posted in Afferent Coupling, Code metrics, Cohesion, coupling, Dead Code, Efferent Coupling. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Pham Huy Anh

    Unused code: Hi experts.
    Could you please tell me a little more about Unused/Dead code?

    Can it discover stupid code such as

    final boolean test = false;
    if (test) doSomething();

    or just explicit unreferenced/called variable/method

    Thanks,
    huyanh

    Anyway, I am talking about Xdepend for Java.

  • Pham Huy Anh

    Unused code: Hi experts.
    Could you please tell me a little more about Unused/Dead code?

    Can it discover stupid code such as

    final boolean test = false;
    if (test) doSomething();

    or just explicit unreferenced/called variable/method

    Thanks,
    huyanh

    Anyway, I am talking about Xdepend for Java.

  • http://colinjack.blogspot.com Colin Jack

    Thanks for replying and its the latter case.

    I’ll try and exclude the assemblies in question (framework assemblies and some of our abstract assemblies) but to be honest I’ve not had a lot of success getting that sort of approach to work in other situations.

    It would be useful if NDepend allowed you to tweak things like TypeCe, for example when viewing TypeCe I’m not overly worried on dependencies on things like IEnumerable.

  • http://www.NDepend.com Patrick Smacchia

    Colin, do you mean you need to write something like:
    SELECT TOP 10 TYPES OUT OF ASSEMBLIES “Asm1″, “Asm2″… ORDER BY TypeCe DESC

    or do you mean that you want to tweak the Ce values to ignore some assemblies?
    In this last scenario, for now, you need to discard the unwanted assemblies from your NDepend project.
    However, I estimate that this particular task could be better done by using the Dependencies Matrix and just keeping the assemblies you are interested in, in the matrix headers.

  • http://colinjack.blogspot.com/ Colin Jack

    Just to correct myself, I meant….

    I was wondering how I can change this assembly to ignore dependencies to certain assemblies or to framework assemblies:

    SELECT TOP 10 TYPES ORDER BY TypeCe DESC

  • http://colinjack.blogspot.com/ Colin Jack

    I was wondering how I can change this assembly to ignore dependencies from certain assemblies or from framework assemblies:

    SELECT TOP 10 TYPES ORDER BY TypeCe DESC