Patrick Smacchia [MVP C#]

Sponsors

The Lounge

Wicked Cool Jobs

News

  • NDepend v3 is fully integrated in Visual Studio, and is now available for download! Software dependencies visualization, 82 .NET software metrics, continuous rule validations, assembly version diff, declarative code queries and more ! http://ndepend.com

Advertisement

Images in this post missing? We recently lost them in a site migration. We're working to restore these as you read this. Should you need an image in an emergency, please contact us at imagehelp@codebetter.com
Getting rid of spaghetti code in the real-world: a Case Study

 
A part of my job as lead developer of the tool NDepend, is to make sure that the product fits real-world development team needs. To do so, I sometime go consulting with some of our clients, see what their problems are and try to propose some solutions. This is a win-win activity: clients can get an outside view about their developments and for my part, I can see how NDepend is used in their shop and what could be done to make their life easier.

 

Description of the project

Last week I went for consulting. Although I am under NDA and can’t reveal who the client is, I can still expose their problems and planned solutions. The company is an ISV that publishes one software. With this software, they are world-wide leader in their business and I am pretty sure you already dealt with their product. The bulk of their code base is made of around 600K lines of C# Code. Because the software exists for a dozen years, it is also made of some ASP, COM+, VB6 and even J# legacy code that will be discarded in the mid-term. The technical team is made of around 60 developers + team leaders + a team of 5 architects.

 

The C# code is partitioned in around 160 assemblies + a dozen of ASP.NET application. This ASP.NET presentation layer is relatively well abstracted from the rest of the code. Each of these ASP.NET applications references a large Core assembly made of 150K lines of C# code. This assembly is made of 144 namespaces. When viewing these namespaces with the NDepend Dependency Structure Matrix in indirect mode, almost all cells are black, meaning that each namespace directly or indirectly depends on all other namespaces (more on this here). In such situation, the term spaghetti code sounds appropriate. This situation is the results of many years of development without active care for dependencies and I have never seen any code base without such entangled code (see my post on NHibernate 2.0 code base for example).

 

This Core assembly contains code relative to all aspects and concerns of the business, but also some low level technical code such as persistence/logging/threading/authentification.... Also, the Core asssembly directly references most of others assemblies. A consequence of all these is that all the ASP.NET applications statically rely on the entire code base. Here is a quick diagram to summarize all this:

 

 

 

 

The Problems

 

Most of problems come from maintenance. It is very hard to know which part of the code is impacted when the code is refactored. Hopefully, the team have a set of automatic tests that help them avoiding regression (although I didn’t get an exact coverage ratio).

 

New feature development is also a problem because from the existing structure, developers have very few guidelines on how to structure new things. The complexity of the spaghetti code (some might say the entropy of the structure of the code base) increases with the time.

 

Last problem but not least, the team is facing over-complex deployment. The entire code base must be deployed for each ASP.NET application. This means some COM+ objects and ASP servers, but also this means some dangerous conflicts like J# code that must run on .NET v3.5 instead of .NET v2.0 for example. This is very unfortunate because each ASP.NET app represent a well defined vertical segment of the business that should be vertically implemented by only a fraction of the code base.

 

 

First thing to do : Levelize Namespaces

 

In this context, the first goal for the architects is to re-structure the code of the Core assembly in order to get a namespaces’ structure layered/levelized. In other words, they need to get rid of dependencies cycles.

 

They also need to re-partition types accross namespaces to better match the vertical separation of business segment. I carefully used the terms re-structure and re-partition and not the term refactor. Indeed, the bulk of the work will consists here in working on type declarations, by inversing some dependencies thanks to the dependency inversion/inversion of control principle, by moving some types from one namespace to another, by creating interfaces to avoid direct dependency on implementations, by creating or discarding some namespaces, by eventually splitting complex classes into simpler smaller classes. Ideally, during this levelization phase, they will change very few logic inside the bodies of the methods. By doing so, they will thus limit the risk of regression bugs introduced, they will limit the need to re-write some existing automatic tests and even more important, they won’t disturb too much current developments.

 

At first glance the levelization phase often seems more daunting than what it is. My theory is that while coding, developers instinctively know that creating a dependency from low-level to high-level code is bad. For example a case study of how we levelized the 15 root namespaces of the VisualNDepend assembly (around 25K LoC at that time) can be found in this article in the section Getting rid of dependency cycles: a case-study. It took us only 3 days.

 

In their particular case, the Core assembly is 50% made of business code and 30% made of persistence code. Although business and persistence code are using each others, the NDepend Dependency Structure matrix clearly shows that there is much more static dependencies from business to persistence than the opposite. The reason is that developers instinctively know that the business is higher level code than the persistence code. This instinct however is not strong enough to prevent all dependencies from the persistence code to the business code. For this reason, each time the architects will get rid of a problematic dependency, they will create a CQL rule to make sure the dependency won’t reappears. In this particular example the rule will look like:

 

  WARN IF Count > 0 IN 
  SELECT NAMESPACES FROM ASSEMBLIES "Product.Core" 
  WHERE NameLike "Product.DB" AND IsDirectlyUsing "Product.Business.*"

Once all namespaces will be layered, the architects will be able to discard all these temporary rules to just keep a CQL rule that will basically says: make sure that all namespaces of this assembly are layered. This could look like:

  WARN IF Count > 0 IN 
  SELECT NAMESPACES WHERE ContainsNamespaceDependencyCycle AND NameIs "Product.Core"

Or also:

  WARN IF Count > 0 IN 
  SELECT NAMESPACES FROM ASSEMBLIES "Product.Core" 
  WHERE !HasLevel

The explanation about why these CQL rules work can be found at the end of this article in the section Levelization.

 

Second thing to do: redefine the assemblies bounds

Once all the namespaces of Core will be levelized, it will be pretty easy for the architects to split the Core assemblies into several smaller assemblies. If you read my blog, you might know that I am against the proliferation of the number of assemblies but don’t take me wrong: assemblies are some physical unit of deployment and if the code A must run on another machine or even another AppDomain than the code B, A and B must reside into 2 distinct assemblies.

 

This phase will also be a good opportunity to re-partition the code in other assemblies than Core. The final result should be less but bigger assemblies (except the Core assembly of course that will be splitted). These assemblies will be well partitioned according to the business segment and as a natural results, the ASP.NET applications outer dependencies will be easy to define and to control. The code will be then much more maintainable because the big Core spaghetti will have been componentized into levelized namespaces and assemblies. Each component can be tested and developed independently of almost all the rest of the code base. Btw, from my experience I would say that the right size for a component lies between 500 and 2000 LoC.

 

Here is a quick diagram of what will look like the re-structured application:

 

 

 

 

 

Continuously Check the Quality From Now!

While the architects main goal is to re-take control over the code base structure and maintenance, they also would be glad to apply some basic quality principles. They already tried FxCop but the tool literally spit many thousands of rules violation. Even if they could freeze new features development (which is impossible for any development shop) they would need months or even years to fix them all. In this context, the Quality From Now! possibility of NDepend sounded appealing to them.

 

The idea behind Quality From Now! is simple: make sure that From Now, all the code added or refactored will respect a minimum quality gap. To do so, the default CQL quality rules must be adjusted to match only code added or refactored From Now. For example this rule matches all methods with more than 20 lines of code…

  WARN IF Count > 0 IN 
  SELECT METHODS WHERE NbLinesOfCode> 20

 …while this CQL rule matches all methods added since a particular point in the past with more than 20 lines of code:

  WARN IF Count > 0 IN 
  SELECT METHODS WHERE WasAdded AND NbLinesOfCode> 20

NDepend comes with several facilities out of the box to define the particular point in the past, such as, a snapshot of the code base made on the 23 September 2008 at 18:57, or a snapshot of the code base made 7 days ago, or the last snapshot available. More details about these facilities here.

 

The same way as the CQL condition WasAdded is used to match added methods, the CQL condition CodeWasChanged can be used to match refactored methods.

 

The decision of the architect team is to use this From Now idea first only on methods added and on basic quality metrics with large thresholds (such as per methods, 30 Lines of code maximum, a Cyclomatic Complexity of 10 maximum, a NestingDepth of 5 maximum and at least 70% coverage by tests). An NDepend analysis will be ran every night and every morning team leaders will ask developers to correct quality flaws detected. There is no doubt that this will generate discussions about coding style between developers.

 

If the experiment is positive, the quality demanded will be then higher and some more quality criteria will be asked, such as number of parameters/variables per methods, a minimum comment to code ratio, specific naming conventions, more immutable state, etc. It will be then also time to ask for quality of refactored code, which can be a lot to ask. Indeed, changing one line in a big method can results in a 2 days refactoring and test writing.

 

Final thoughts

One aspect I didn't mention in the post for clarity is that we also discuss the possibility of regrouping domain classes in a domain namespace or assembly. These class would be POCO (Plain Old CLR Objects) because they will be widely used and per se, must not depend on anything else than plain CLR types (string, list, int, enumeration...). Because coding such domain layer strongly relies on a pure structure, this can only be done when namespaces are levelized.

 

A central point that couldn’t be discussed with the architect during our meeting, is the matter of delay. My opinion is that in software development, delays can only be estimated from the past work, done within a similar context. This is the entire point of the book Software Estimation. With the time, the architecture team will get a solid knowledge on how long they need to levelize X Lines of Code that contains Y dependencies cycles. In the future, I hope that they will feedback me quantitative data about their work and I will then be glad to blog more about how what they are doing.

 

to be continued... 

 

 


Posted Tue, Sep 23 2008 5:43 PM by Patrick Smacchia

[Advertisement]

Comments

Dew Drop - September 23, 2008 | Alvin Ashcraft's Morning Dew wrote Dew Drop - September 23, 2008 | Alvin Ashcraft's Morning Dew
on Tue, Sep 23 2008 3:23 PM

Pingback from  Dew Drop - September 23, 2008 | Alvin Ashcraft's Morning Dew

Reflective Perspective - Chris Alcock » The Morning Brew #186 wrote Reflective Perspective - Chris Alcock » The Morning Brew #186
on Wed, Sep 24 2008 2:59 AM

Pingback from  Reflective Perspective - Chris Alcock  » The Morning Brew #186

DotNetKicks.com wrote Getting rid of spaghetti code in the real-world: a Case Study
on Thu, Sep 25 2008 3:36 PM

You've been kicked (a good thing) - Trackback from DotNetKicks.com

links for 2008-09-25 | the markfr ditherings wrote links for 2008-09-25 | the markfr ditherings
on Thu, Sep 25 2008 5:03 PM

Pingback from  links for 2008-09-25 | the markfr ditherings

SergioTarrillo - RichWeblog wrote analizando codigo maleado?, tienes que usar NDepend
on Fri, Jan 9 2009 1:02 AM

Entiéndase código maleado, a aquel código que todos los miran pero nadie lo quiere tocar, aquel código

Add a Comment

(required)  
(optional)
(required)  
Remember Me?
Devlicio.us