CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Patrick Smacchia [MVP C#]


Code metrics on Coupling, Dead Code, Design flaws and Re-engineering

There is a whole range of interesting code metrics relative to coupling. The simplest ones are named Afferent Coupling (Ca) and Efferent Coupling (Ce). Basically, the Ca for a code element is the number of code elements that use it and the Ce is the number of code elements that it uses.




You can define Ca and Ce for the graph of assemblies dependencies, the graph of namespaces dependencies, the graph of types dependencies and the graph of methods dependencies of a code base. You can also define the Ca metric on the fields of a program as the number of methods that access the field. This leads to 9 metrics all supported by the tool NDepend. We precise that when computing Ce, NDepend takes account of code elements defined in tier code such as the code of the .NET framework.

 

With NDepend, if you wish to know which methods of your program are massively used you can write the following CQL Query :

 

SELECT TOP 10 METHODS ORDER BY MethodCa DESC

 

Being used a lot is not necessarily a problem. However it is still interesting to know which part of your code base is used a lot. For example, if we apply the CQL query below to the core of the .NET framework (i.e mscorlib, System, System.Core, System.Xml…) we obtain the following 10 methods:

 

methods

Afferent coupling at method level (MethodCa)

System.Environment.GetResourceString(String)

2632

System.Object..ctor()

2416

System.ArgumentNullException..ctor(String)

2368

System.String.get_Length()

1795

System.Type.GetTypeFromHandle(RuntimeTypeHandle)

1762

System.IDisposable.Dispose()

1119

System.SR.GetString(String)

1115

System.InvalidOperationException..ctor(String)

1101

System.Object.ToString()

1056

System.ArgumentException..ctor(String)

1010

 

 

High Efferent Coupling and design flaws

 

If you wish to know which types of your program are heavy users of other types you just have to write:

 

SELECT TOP 10 TYPES ORDER BY TypeCe DESC

 

High Ce might reveal a design problem. Types with high Ce are entangled with many other implementations. The higher the Ce, the higher the number of responsibilities the type has. If we apply the CQL query above to the core of the .NET framework we obtain the following list:

 

types

Efferent coupling at type level (TypeCe)

Microsoft.CSharp.CSharpCodeGenerator

172

Microsoft.VisualBasic.VBCodeGenerator

161

System.Net.HttpWebRequest

138

System.Net.Sockets.Socket

137

System.AppDomain

131

System.RuntimeType

128

System.Xml.Xsl.XsltOld.Compiler

125

System.Xml.Xsl.Xslt.QilGenerator

124

System.Xml.Serialization.XmlSchemaImporter

120

System.Diagnostics.Process

116

 

As expected this list contains high level classes such as AppDomain, Process or Socket. This sort of classes with high Ce are needed to implement complex concepts that spawn numerous concerns. For example, by selecting the 131 types used by the AppDomain class I can tell that AppDomain is concerned by assembly security (Code Access Security, strong naming…), Windows users and file security, OS environment info, .NET Remoting, .NET Reflection, Threading and globalization.
 

Could the AppDomain class be split into several smaller classes? I guess no because this is such an essential class. But this is the exception. Generally, classes with high Ce would be more like the CSharpCodeGenerator class which represent alone a component. I can see that the class CSharpCodeGenerator deals with a lot of implementation detail of the C# language such as exception, type casting, all sorts of members (method, field, event, property…), comment, indentation etc. This sounds good as long as each details has its own implementation the class CSharpCodeGenerator acts as a high level mediator. But if I dig further and decompile some methods I can see that the class CSharpCodeGenerator also cops with file management and contains a lot of logic (>1000 Lines of Code) to handle some CodeDOM details. This is likely an indication that this code could have been better designed with the help of several smaller collaborating classes.

 

Afferent Coupling and Dead Code

 

Ce values are meaningful values when it comes to assessing some design and when you have to re-engineer code. Ca values are also useful, especially when equal to 0. A Ca value equals to 0 indicates a potential dead code element. A dead code element is an element that can be discarded because it is not used by the program anymore. Pruning dead code is a necessary task to make sure that your code is rationalized. Here also the tool NDepend can help you because it knows about Ca. However things get more complicated here because there are numerous cases where a zero Ca doesn’t mean dead code. For example, entry points (i.e Main methods), class constructors or finalizers represent some methods that will always have a zero Ca. However these methods are not dead code because the CLR will call them at runtime.

 

Here is the CQL rule that we provide by default to detect dead methods:

 

// <Name>Potentially unused methods</Name>

WARN IF Count > 0 IN SELECT TOP 10 METHODS WHERE

 MethodCa == 0 AND            // Ca=0 -> No Afferent Coupling -> The method 

                              // is not used in the context of this

                              // application.

 

 !IsPublic AND                // Public methods might be used by client 

                              // applications of your assemblies.

 

 !IsEntryPoint AND            // Main() method is not used by-design.

 

 !IsExplicitInterfaceImpl AND // The IL code never explicitely calls 

                              // explicit interface methods implementation.

 

 !IsClassConstructor AND      // The IL code never explicitely calls class

                              // constructors.

 

 !IsFinalizer                 // The IL code never explicitely calls

                              // finalizers.

 

Notice how we consider that public methods should be not considered as dead code in the general case. This rule generally matches a lot of false positive because when statically analyzing the IL code, we can see that often overridden implementations are not statically linked. Hence to get a first evaluation of dead code it is worth adding the restricting condition AND !IsVirtual. This particular issue will be addressed by further versions of NDepend.

 

Things gets more easy and efficient when it comes to detect dead fields and dead types. Here are the 2 CQL rules we propose by default and their particular conditions to avoid false positive:

 

// <Name>Potentially unused fields</Name>

WARN IF Count > 0 IN SELECT TOP 10 FIELDS WHERE

 FieldCa == 0 AND  // Ca=0 -> No Afferent Coupling -> The field is not used

                   // in the context of this application.

 

 !IsPublic AND     // Although not recommended, public fields might be used

                   // by client applications of your assemblies.

 

 !IsLiteral AND    // The IL code never explicitely uses literal fields.

 

 !IsEnumValue AND  // The IL code never explicitely uses enumeration value.

 

 !NameIs "value__" // Field named 'value__' are relative to enumerations 

                   // and the IL code never explicitely uses them.

 

 

 

// <Name>Potentially unused types</Name>

WARN IF Count > 0 IN SELECT TOP 10 TYPES WHERE

 TypeCa == 0 AND     // Ca=0 -> No Afferent Coupling -> The type is not 

                     // used in the context of this application.

 

 !IsPublic AND       // Public types might be used by client 

                     // applications of your assemblies.

 

 !NameIs "Program"   // Generally, types named Program contain a Main() 

                     // entry-point method and this condition avoid 

                     // to consider such type as unused code.

 

Notice how easy it is to customize these rules thanks to CQL facilities such as NameIs, NameLike, SELECT OUT OF/FROM etc…

 

 

Ranking Metrics

 

Since with NDepend, we have an efficient in-memory representation of internal dependencies of a code base, we got the idea of implementing the famous Google Page Rank algorithm to the graph of methods and the graph of types. As a consequence, the 2 metrics TypeRank and MethodRank indicate which types and which methods of a code base are the most important. As a web page with Google, a code element is considered as important if it is used by numerous code elements that themselves are considered more or less important.

 

When discovering a code base knowing which types and methods are important is -well- important, because they likely represent the cornerstone of the code base, the ones that you will have to understand first in order to dig into the program structure. If you have the chance to be educated on the code base by one of its developer, she will likely talk to you first about these important code elements in order to give you the basics. What is cool is that you can know automatically and objectively about this information thanks to the ranking metrics.

 

For example, suppose if you were a complete beginner in .NET programming, what would be the top 10 most important types to know about? Integer? String? Object? Bool? Let’s see what the TypeRank metric has to say about the top 10 most important types of the .NET framework:

 

SELECT TOP 10 TYPES ORDER BY TypeRank DESC

 

types

Type Rank

System.Runtime.InteropServices.ComVisibleAttribute

409.03

System.Object

380.89

System.Runtime.InteropServices.ClassInterfaceAttribute

329.73

System.Void

281.02

System.CLSCompliantAttribute

189.13

System.Int32

170.17

System.Boolean

168.88

System.Runtime.InteropServices.GuidAttribute

153.64

System.String

145.63

System.Runtime.InteropServices.InterfaceTypeAttribute

143.32

 

Hopefully we find the types that we considered important, but we also get some interesting finds that show how the COM and interop things such as ComVisibleAttribute or GuidAttribute are so pervasive inside the .NET framework MS implementation.

 

We also precise that code that is considered important deserves even more attention than the rest in terms of test code coverage and design.

 

NDepend supports also several others code metrics relative to coupling such the Association Between Classes, the Lack Of Cohesion of Methods (LCOM) and the Robert C.Martin metrics on assemblies. I will certainly write some thoughts on all these on future posts. Meantime you can read their definition and try them on your own code base. You can also have a glance and print the great NDepend Poster Metrics done by Stuart Celarier, Patrick Cauldwell and Scott Hanselman .

 

 



Comments

DotNetKicks.com said:

You've been kicked (a good thing) - Trackback from DotNetKicks.com

# February 15, 2008 11:19 AM

Page Rank » Blog Archive » Code metrics on Coupling, Dead Code, Design flaws and Re-engineering said:

Pingback from  Page Rank  &raquo; Blog Archive   &raquo; Code metrics on Coupling, Dead Code, Design flaws and Re-engineering

# February 15, 2008 1:22 PM

Code metrics on Coupling, Dead Code, Design flaws and Re-engineering | Event Management Security said:

Pingback from  Code metrics on Coupling, Dead Code, Design flaws and Re-engineering | Event Management Security

# February 17, 2008 8:01 PM

Colin Jack said:

I was wondering how I can change this assembly to ignore dependencies from certain assemblies or from framework assemblies:

SELECT TOP 10 TYPES ORDER BY TypeCe DESC

# February 22, 2008 9:01 AM

Colin Jack said:

Just to correct myself, I meant....

I was wondering how I can change this assembly to ignore dependencies to certain assemblies or to framework assemblies:

SELECT TOP 10 TYPES ORDER BY TypeCe DESC

# February 22, 2008 9:55 AM

Patrick Smacchia said:

Colin, do you mean you need to write something like:

SELECT TOP 10 TYPES OUT OF ASSEMBLIES  "Asm1", "Asm2"... ORDER BY TypeCe DESC

or do you mean that you want to tweak the Ce values to ignore some assemblies?

In this last scenario, for now, you need to discard the unwanted assemblies from your NDepend project.

However, I estimate that this particular task could be better done by using the Dependencies Matrix and just keeping the assemblies you are interested in, in the matrix headers.

# February 22, 2008 12:16 PM

Colin Jack said:

Thanks for replying and its the latter case.

I'll try and exclude the assemblies in question (framework assemblies and some of our abstract assemblies) but to be honest I've not had a lot of success getting that sort of approach to work in other situations.

It would be useful if NDepend allowed you to tweak things like TypeCe, for example when viewing TypeCe I'm not overly worried on dependencies on things like IEnumerable.

# February 23, 2008 7:04 AM

Patrick Smacchia [MVP C#] said:

I am impressed by the buzz done around my last post on Number of Types in the .NET Framework . Actually

# March 19, 2008 8:32 AM

Coverage » Code metrics on Coupling, Dead Code, Design flaws and Re-engineering said:

Pingback from  Coverage &raquo; Code metrics on Coupling, Dead Code, Design flaws and Re-engineering

# April 19, 2008 4:23 AM

Patrick Smacchia [MVP C#] said:

Recently, both Glenn Block and Ayende wrote about how to define some sort of active conventions about

# May 11, 2008 4:53 PM