A part of my job as lead
developer of the tool NDepend,
is to make sure that the product fits real-world development team needs. To do
so, I sometime go consulting with some of our clients, see what their problems
are and try to propose some solutions. This is a win-win activity: clients can
get an outside view about their developments and for my part, I can see how
NDepend is used in their shop and what could be done to make their life easier.
Description of the project
Last week I went for
consulting. Although I am under NDA and can’t reveal who the client is, I can still
expose their problems and planned solutions. The company is an ISV that
publishes one software. With this software, they are world-wide leader in their
business and I am pretty sure you already dealt with their product. The bulk of
their code base is made of around 600K lines of C# Code.
Because the software exists for a dozen years, it is also made of some ASP,
COM+, VB6 and even J# legacy code that will be discarded in the mid-term. The
technical team is made of around 60 developers + team leaders + a team of 5
architects.
The C# code is
partitioned in around 160 assemblies + a dozen of ASP.NET application. This
ASP.NET presentation layer is relatively well abstracted from the rest of the
code. Each of these ASP.NET applications references a large Core assembly made of
150K lines of C# code. This assembly is made of 144 namespaces. When viewing
these namespaces with the NDepend Dependency Structure Matrix in indirect mode,
almost all cells are black, meaning that each namespace directly or indirectly
depends on all other namespaces (more
on this here). In such situation, the term spaghetti
code sounds appropriate. This situation is the results of many years of
development without active care for dependencies and I have never seen any code
base without such entangled code (see my post on NHibernate 2.0 code base for
example).
This Core assembly
contains code relative to all aspects and concerns of the business, but also
some low level technical code such as persistence/logging/threading/authentification....
Also, the Core asssembly directly references most of others assemblies. A
consequence of all these is that all the ASP.NET applications statically rely
on the entire code base. Here is a quick diagram to summarize all this:
The Problems
Most of problems come
from maintenance. It is very hard to know which part of the code is impacted
when the code is refactored. Hopefully, the team have a set of automatic tests
that help them avoiding regression (although I didn’t get an exact coverage ratio).
New feature development
is also a problem because from the existing structure, developers have very few
guidelines on how to structure new things. The complexity of the spaghetti code (some might say the entropy of the structure of the code base) increases with the time.
Last problem but not
least, the team is facing over-complex deployment. The entire code base must be
deployed for each ASP.NET application. This means some COM+ objects and ASP
servers, but also this means some dangerous conflicts like J# code that must
run on .NET v3.5 instead of .NET v2.0 for example. This is very unfortunate
because each ASP.NET app represent a well defined vertical segment of the
business that should be vertically implemented by only a fraction of the code
base.
First thing to do : Levelize Namespaces
In this context, the
first goal for the architects is to re-structure
the code of the Core assembly in order to get a namespaces’ structure
layered/levelized. In other words, they need to get rid of dependencies cycles.
They also need to re-partition types accross namespaces to
better match the vertical separation of business segment. I carefully used the terms
re-structure and re-partition and not the term refactor.
Indeed, the bulk of the work will consists here in working on type
declarations, by inversing some dependencies thanks to the dependency inversion/inversion
of control principle,
by moving some types from one namespace to another, by creating interfaces to
avoid direct dependency on implementations, by creating or discarding some
namespaces, by eventually splitting complex classes into simpler smaller
classes. Ideally, during this levelization
phase, they will change very few logic inside the bodies of the methods. By
doing so, they will thus limit the risk of regression bugs introduced, they
will limit the need to re-write some existing automatic tests and even more
important, they won’t disturb too much current developments.
At first glance the
levelization phase often seems more daunting than what it is. My theory is that
while coding, developers instinctively know that creating a dependency from
low-level to high-level code is bad. For example a case study of how we levelized
the 15 root namespaces of the VisualNDepend assembly (around 25K LoC at that
time) can be found in this article in the section Getting rid of dependency cycles: a case-study. It took us only 3 days.
In their particular case,
the Core assembly is 50% made of business code and 30% made of persistence
code. Although business and persistence code are using each others, the NDepend
Dependency Structure matrix clearly shows that there is much more static
dependencies from business to persistence than the opposite. The reason is that
developers instinctively know that the business is higher level code than the
persistence code. This instinct however is not strong enough to prevent all
dependencies from the persistence code to the business code. For this reason,
each time the architects will get rid of a problematic dependency, they will
create a CQL rule to make sure the dependency won’t reappears. In this
particular example the rule will look like:
WARN IF Count > 0 IN
SELECT NAMESPACES FROM ASSEMBLIES "Product.Core"
WHERE NameLike "Product.DB" AND IsDirectlyUsing "Product.Business.*"
Once all namespaces will
be layered, the architects will be able to discard all these temporary rules to
just keep a CQL rule that will basically says: make sure that all namespaces of
this assembly are layered. This could look like:
WARN IF Count > 0 IN
SELECT NAMESPACES WHERE ContainsNamespaceDependencyCycle AND NameIs "Product.Core"
Or also:
WARN IF Count > 0 IN
SELECT NAMESPACES FROM ASSEMBLIES "Product.Core"
WHERE !HasLevel
The explanation about why
these CQL rules work can be found at the end of this article in the section Levelization.
Second thing to do: redefine the assemblies
bounds
Once all the namespaces
of Core will be levelized, it will be pretty easy for the architects to split
the Core assemblies into several smaller assemblies. If you read my blog, you
might know that I am against the proliferation of the number of assemblies but
don’t take me wrong: assemblies are some physical unit of deployment and if the
code A must run on another machine or even another AppDomain than the code B, A
and B must reside into 2 distinct assemblies.
This phase will also be a
good opportunity to re-partition the code in other assemblies than Core. The
final result should be less but bigger assemblies (except the Core assembly of
course that will be splitted). These assemblies will be well partitioned
according to the business segment and as a natural results, the ASP.NET
applications outer dependencies will be easy to define and to control. The code
will be then much more maintainable because the big Core spaghetti will have
been componentized into levelized namespaces and assemblies. Each component can
be tested and developed independently of almost all the rest of the code base.
Btw, from my experience I would say that the right size for a component lies
between 500 and 2000 LoC.
Here is a quick diagram
of what will look like the re-structured
application:
Continuously Check the Quality From Now!
While the architects main
goal is to re-take control over the
code base structure and maintenance, they also would be glad to apply some
basic quality principles. They already tried FxCop
but the tool literally spit many thousands of rules violation. Even if they
could freeze new features development (which is impossible for any development
shop) they would need months or even years to fix them all. In this context,
the Quality From Now! possibility of
NDepend sounded appealing to them.
The idea behind Quality From Now! is simple: make sure
that From Now, all the code added or
refactored will respect a minimum quality gap. To do so, the default CQL
quality rules must be adjusted to match only code added or refactored From Now. For example this rule matches
all methods with more than 20 lines of code…
WARN IF Count > 0 IN
SELECT METHODS WHERE NbLinesOfCode> 20
…while this CQL rule
matches all methods added since a
particular point in the past with more than 20 lines of code:
WARN IF Count > 0 IN
SELECT METHODS WHERE WasAdded AND NbLinesOfCode> 20
NDepend comes with
several facilities out of the box to define the
particular point in the past, such as, a snapshot of the code base made on
the 23 September 2008 at 18:57, or a snapshot of the code base made 7 days ago,
or the last snapshot available. More details about these facilities here.
The same way as the CQL
condition WasAdded is used to match added methods, the CQL condition CodeWasChanged can be used to match
refactored methods.
The decision of the
architect team is to use this From Now
idea first only on methods added and on basic quality metrics with large
thresholds (such as per methods, 30 Lines of code
maximum, a Cyclomatic Complexity
of 10 maximum, a NestingDepth
of 5 maximum and at least 70%
coverage by tests). An NDepend analysis will be ran every night and every
morning team leaders will ask developers to correct quality flaws detected. There is no doubt that this will generate discussions about coding style between developers.
If the experiment is positive, the quality demanded will be then higher
and some more quality criteria will be asked, such as number of parameters/variables
per methods, a minimum comment to code ratio, specific naming conventions, more immutable state, etc.
It will be then also time to ask for quality of refactored code, which can be a
lot to ask. Indeed, changing one line in a big method can results in a
2 days refactoring and test writing.
Final thoughts
One aspect I didn't mention in the post for clarity is that we also
discuss the possibility of regrouping domain classes in a domain
namespace or assembly. These class would be POCO (Plain Old CLR Objects) because they will be widely used and per se,
must not depend on anything else than plain CLR types (string, list,
int, enumeration...). Because coding such domain layer strongly relies
on a pure structure, this can only be done when namespaces are
levelized.
A central point that
couldn’t be discussed with the architect during our meeting, is the matter of
delay. My opinion is that in software development, delays can only be estimated
from the past work, done within a similar context. This is the entire point of
the book Software Estimation.
With the time, the architecture team will get a solid knowledge on how long
they need to levelize X Lines of Code that contains Y dependencies cycles. In
the future, I hope that they will feedback me quantitative data about their
work and I will then be glad to blog more about how what they are doing.
to be continued...
Posted
Tue, Sep 23 2008 5:43 PM
by
Patrick Smacchia