Hints on how to componentize existing code

Representing the structure of a code base with a DSM (Dependencies Structure Matrix) is a great mean to perform all kind of useful tasks like determining layering of the code base or pinpointing component dependency cycles. NDepend supports a DSM view with many options such as

  • facility to highlight cycles,
  • generation of ‘Boxes and Arrows’ diagram,
  • customizing the dependency weight (how many methods/types/namespaces of an assemblies are using how many methods/types/namespaces of another assembly)
  • indirect\transitive dependency mode (for example you can see that A is indirectly using C if A is using B and B is using C)

We provide here a 4mn screencast that explains step by step all these facilities.

There is a use of DSM that was actually not expected during the DSM development: it can give some hints on how to componentize existing code. For example below is the DSM for the 97 assemblies of DotNetNuke 4.1 code base (DotNetNuke is an OSS framework to create Web App). This squared matrix has the interesting property of having a square in the upper-left corner. Having such a square in a squared DSM means that elements involved in the square are highly cohesive.

 

 

Below is a zoomed snapshot of the square. We can now see that the 19 assemblies involved in the square (index 19 to 37 on the picture) are named following the pattern DotNetNuke.Modules.Store.*. We now understand that the developers of the DotNetNuke project decided to spread the code of the component DotNetNuke.Modules.Store on 19 assemblies. Personally, I prefer having few big assemblies instead of numerous small ones and I debated my point of view in this article (at the beginning, section .NET Components). Maybe, in this particular example it makes sense to split the DotNetNuke.Modules.Store code into several assemblies.

 

 

Interestingly enough, the algorithm we use to order the code elements into the matrix headers has been able to pinpoint an intention buried into the code structure. This algorithm can give several results for the same set of code elements and you can browse all these results with the Triangularize button. Each time you press this button, a new order will be computed that will try to form squares. Below is a snapshot of another DSM still for the assemblies of DotNetNuke. The square we see now is still made of the DotNetNuke.Modules.Store.* assemblies.

 

 

I had the chance to test this algorithm while consulting. My mission was to give hints on how to componentize a giant code base made of more than a million lines of C# code. The client nicely lets me talk about this experience. The audited project is made of 549 assemblies. As a consequence, it takes several hours to be compiled on big 16GB multi-core servers. I think that there is a lot of room for improvement as I explain in this post on how to benefit from C# compiler awesome performance. The first thing to do is to merge the code into less assemblies. And here, the DSM is absolutely necessary because the code base is so big and there are so many assemblies that it is humanly impossible to partition the assemblies. Below is the DSM  made of the 549 assemblies. It clearly pinpoints 6 squares that will certainly be turned into 6 assemblies. Applying the algorithm several times with the Triangularize button shows several smaller squares.

 

 

Another common problem is how to componentize a namespace that contains hundreds of classes? When you are doing a framework, namespaces are a good way to partition the public surface you want to present to your clients. When you are doing an application, namespaces are a good way to partition your classes into a hierarchy of components. The .NET Framework comes with a super namespace: System.Windows.Forms. It is made of 1.509 types (1.093 are public). 688 types are nested types (i.e types declared inside another type). This reduces significantly the number of types we might want to componentize to 1.509 – 688 = 821 types. Below is the DSM of these 821 types and we can see several small squares and one giant square in the middle.

 

 

If we zoom we can see that small squares have red borders. The NDepend’s DSM shows red borders to highly a dependency cycle. If you position the mouse on the upper-left corner of a cycle, the info view lists elements involved into the cycle. Having a cycle between a set of classes is an indication of strong cohesiveness. This information represents a hint to create a component. In the picture below we see that the 16 types involved into the cycle are those related to the DataGrid control. Similarly, the second picture below lists 7 types related to the WebBrowser control.

 

 

 

And the picture below shows that the giant square is actually made of 298 types relative to View controls, such as DataGridView, ListView or TreeView.

 

 

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://www.NDepend.com Patrick Smacchia

    There are 2 facts to take account:

    -First, compilation is not a very parallelizable process. If A depends on B the compilation of A has to be done after the compilation of B.

    -Second the performance of the C# and VB.NET compiler are just awesome when dealing with just one project. You can expect around 5 to 10 K lines of code per second.

    However there is still room for optimization with parallelization and for certain application, it might make sense to split an assembly into N assemblies just because you know that these N assemblies can be compiled simulteanously.

    The error is to componentize code with assemblies for other reasons than physical reasons (i.e like parallelization for compilation, lazy load of code, addin and appdomain paradigm, code sharing and framework…)

  • http://codebetter.com/blogs/gregyoung Greg

    Patrick,

    I am curious about your recommendation to make ‘bigger assemblies’. This would at first seem to make things more serially performant but seems to lose possible benefits of parallelism. I can for instance in a build consisting of 1000 assemblies derive fairly easily a dependency map and give enough machines/processors (and hopefully a low amount of coupling) do the majority of my build in parallel.

    Would you comment on any experiences you may have had with parallelizing builds in order to increase performance?

    Cheers,

    Greg