There is a whole range of interesting code
metrics relative to coupling. The simplest ones are named Afferent Coupling (Ca)
and Efferent Coupling (Ce). Basically, the Ca for a code
element is the number of code elements that use it and the Ce is the number of
code elements that it uses.
You can define Ca and Ce for the graph of
assemblies dependencies, the graph of namespaces dependencies, the graph of types dependencies and the graph of
methods dependencies of a code base. You can also define the Ca metric on the fields of a
program as the number of methods that access the field. This leads to 9 metrics
all supported by the tool NDepend.
We precise that when computing Ce, NDepend takes account of code elements
defined in tier code such as the code of the .NET framework.
With NDepend, if you wish to know which methods
of your program are massively used you can write the following CQL Query
:
SELECT TOP 10 METHODS ORDER BY MethodCa DESC
Being used a lot is not necessarily a problem.
However it is still interesting to know which part of your code base is used a
lot. For example, if we apply the CQL query below to the core of the .NET
framework (i.e mscorlib, System, System.Core, System.Xml…) we obtain the
following 10 methods:
|
methods |
Afferent coupling at method level (MethodCa) |
|
System.Environment.GetResourceString(String) |
2632 |
|
System.Object..ctor() |
2416 |
|
System.ArgumentNullException..ctor(String) |
2368 |
|
System.String.get_Length() |
1795 |
|
System.Type.GetTypeFromHandle(RuntimeTypeHandle) |
1762 |
|
System.IDisposable.Dispose() |
1119 |
|
System.SR.GetString(String) |
1115 |
|
System.InvalidOperationException..ctor(String) |
1101 |
|
System.Object.ToString() |
1056 |
|
System.ArgumentException..ctor(String) |
1010 |
High Efferent Coupling and
design flaws
If you wish to know which types of your program
are heavy users of other types you just have to write:
SELECT TOP 10 TYPES ORDER BY TypeCe DESC
High Ce might reveal a design problem. Types
with high Ce are entangled with many other implementations. The higher the Ce,
the higher the number of responsibilities the type has. If we apply the CQL query
above to the core of the .NET framework we obtain the following list:
| types |
Efferent |
|
Microsoft.CSharp.CSharpCodeGenerator |
172 |
|
Microsoft.VisualBasic.VBCodeGenerator |
161 |
|
System.Net.HttpWebRequest |
138 |
|
System.Net.Sockets.Socket |
137 |
|
System.AppDomain |
131 |
|
System.RuntimeType |
128 |
|
System.Xml.Xsl.XsltOld.Compiler |
125 |
|
System.Xml.Xsl.Xslt.QilGenerator |
124 |
|
System.Xml.Serialization.XmlSchemaImporter |
120 |
|
System.Diagnostics.Process |
116 |
As expected this list contains high level
classes such as AppDomain, Process or Socket. This sort of classes with high Ce
are needed to implement complex concepts that spawn numerous concerns. For
example, by selecting the 131 types used by the AppDomain class I can tell that
AppDomain is concerned by assembly security (Code Access Security, strong
naming…), Windows users and file security, OS environment info, .NET Remoting, .NET
Reflection, Threading and globalization.
Could the AppDomain class be split into several
smaller classes? I guess no because this is such an essential class. But this
is the exception. Generally, classes with high Ce would be more like the
CSharpCodeGenerator class which represent alone a component. I can see that the
class CSharpCodeGenerator deals with a lot of implementation detail of the C#
language such as exception, type casting, all sorts of members (method, field,
event, property…), comment, indentation etc. This sounds good as long as each
details has its own implementation the class CSharpCodeGenerator acts as a high level mediator. But if I dig
further and decompile some methods I can see that the class CSharpCodeGenerator
also cops with file management and contains a lot of logic (>1000 Lines of
Code) to handle some CodeDOM details. This is likely an indication that this
code could have been better designed with the help of several smaller collaborating
classes.
Afferent Coupling and Dead
Code
Ce values are meaningful values
when it comes to assessing some design and when you have to re-engineer code. Ca values
are also useful, especially when equal to 0. A Ca value equals to 0 indicates a
potential dead code element. A dead code element is an element that can be
discarded because it is not used by the program anymore. Pruning dead code is a
necessary task to make sure that your code is rationalized. Here also the tool NDepend
can help you because it knows about Ca. However things get more complicated
here because there are numerous cases where a zero Ca doesn’t mean dead code. For
example, entry points (i.e Main methods), class constructors or finalizers
represent some methods that will always have a zero Ca. However these methods
are not dead code because the CLR will call them at runtime.
Here is the CQL rule that we provide by
default to detect dead methods:
// <Name>Potentially unused
methods</Name>
WARN IF Count > 0 IN SELECT TOP 10 METHODS WHERE
MethodCa == 0 AND // Ca=0 -> No Afferent Coupling -> The method
// is
not used in the context of this
// application.
!IsPublic AND // Public methods might be used by
client
// applications of your assemblies.
!IsEntryPoint AND // Main() method is not used
by-design.
!IsExplicitInterfaceImpl AND // The IL code never explicitely calls
// explicit interface methods
implementation.
!IsClassConstructor AND // The IL code never explicitely
calls class
// constructors.
!IsFinalizer // The IL code never explicitely
calls
// finalizers.
Notice how we consider that public methods
should be not considered as dead code in the general case. This rule generally matches a lot of false positive because when statically analyzing the
IL code, we can see that often overridden implementations are not statically
linked. Hence to get a first evaluation of dead code it is worth adding the restricting
condition AND !IsVirtual.
This particular issue will be addressed by further versions of NDepend.
Things gets more easy and efficient when it comes to detect
dead fields and dead types. Here are the 2 CQL rules we propose by
default and their particular conditions to avoid false positive:
// <Name>Potentially unused
fields</Name>
WARN IF Count > 0 IN SELECT TOP 10 FIELDS WHERE
FieldCa == 0 AND // Ca=0 -> No Afferent Coupling -> The field is
not used
// in the context of this application.
!IsPublic AND // Although not recommended,
public fields might be used
// by client applications of your assemblies.
!IsLiteral AND // The IL code never explicitely
uses literal fields.
!IsEnumValue AND // The IL code never explicitely
uses enumeration value.
!NameIs “value__” // Field named ‘value__’ are relative to
enumerations
// and the IL code never explicitely uses them.
// <Name>Potentially unused
types</Name>
WARN IF Count > 0 IN SELECT TOP 10 TYPES WHERE
TypeCa == 0 AND // Ca=0 -> No Afferent Coupling -> The type is
not
// used in the context of this application.
!IsPublic AND // Public types might be used by
client
// applications of your assemblies.
!NameIs “Program”
//
Generally, types named Program contain a Main()
// entry-point method and this
condition avoid
// to consider such type as unused code.
Notice how easy it is to customize these
rules thanks to CQL facilities such as NameIs, NameLike, SELECT OUT
OF/FROM etc…
Ranking
Metrics
Since with NDepend, we have an efficient in-memory
representation of internal dependencies of a code base, we got the idea of
implementing the famous Google Page Rank
algorithm to the graph of methods and the graph of types. As a consequence, the
2 metrics TypeRank and MethodRank indicate which types and which methods of a
code base are the most important. As
a web page with Google, a code element is considered as important if it is used
by numerous code elements that themselves are considered more or less
important.
When discovering a code base knowing which types and methods are
important is -well- important, because they likely represent the cornerstone of
the code base, the ones that you will have to understand first in order to
dig into the program structure. If you have the chance to be educated on the code base by
one of its developer, she will likely talk to you first about these important
code elements in order to give you the basics. What is cool is that you can
know automatically and objectively about this information thanks to the ranking
metrics.
For example, suppose if you were a complete beginner in .NET
programming, what would be the top 10 most important types to know about?
Integer? String? Object? Bool? Let’s see what the TypeRank metric has to say about
the top 10 most important types of the .NET framework:
SELECT TOP 10 TYPES ORDER BY TypeRank DESC
|
types |
Type |
|
System.Runtime.InteropServices.ComVisibleAttribute |
409.03 |
|
System.Object |
380.89 |
|
System.Runtime.InteropServices.ClassInterfaceAttribute |
329.73 |
|
System.Void |
281.02 |
|
System.CLSCompliantAttribute |
189.13 |
|
System.Int32 |
170.17 |
|
System.Boolean |
168.88 |
|
System.Runtime.InteropServices.GuidAttribute |
153.64 |
|
System.String |
145.63 |
|
System.Runtime.InteropServices.InterfaceTypeAttribute |
143.32 |
Hopefully we find the types that we considered
important, but we also get some interesting finds that show how the COM and
interop things such as ComVisibleAttribute or GuidAttribute are so pervasive
inside the .NET framework MS implementation.
We also precise that code that is considered
important deserves even more attention than the rest in terms of test code
coverage and design.
NDepend supports also several others code
metrics relative to coupling such the Association Between Classes,
the Lack Of Cohesion of Methods (LCOM) and the Robert C.Martin metrics on
assemblies.
I will certainly write some thoughts on all these on future posts. Meantime you
can read their definition and try them on your own code base. You can also have
a glance and print the great NDepend Poster Metrics done by Stuart Celarier,
Patrick Cauldwell and Scott Hanselman .