There has been recently some discussion about
Code size is the enemy starting from (in order of publication)
- Steve Yegge - Code’s Worst Enemy
- Jeff Atwood – Size Is
- Frans Bouma - Codebase size isn’t the enemy
- Ayende Rahien – Code base size, complexity and language choice
I would like to add
my 2 cents. Code size is not the
enemy, the enemy is everything that avoid you to add new features at a
sustainable rhythm. The top 2 culprits are:
- A bad overall code structure
- Lack of automatic
Both avoid you to evaluate the impact of any change in the code. Consequently one cannot know if this
change here added a bug there, unless one is manually retesting its entire
application, which is what one does. I am talking here about side-effects.
A proper code structure
avoids the propagation of side-effects. Controlling dependencies and thinking about how you are componentizing
your code base lets you separate the concerns and consequently, it limits the
propagation of side effect. The tool NDepend is all about this, making sure that your code structure is clean.
A nice automatic
tests battery tests if side effects resulting from a change break the correctness of the code.
If I would have to choose a third culprit, it would definitely be
copy-pasted/cloned code that Frans described well on its post.
I would like also to
add my 2 cents on the number of Lines Of Code (LOC) measured. I wrote previously on this post about how you should count the number of LOC, with the logical LOC metric. Here are
some measures quoted from the other posts:
- NHibernate: 245,749
- Boo: 212,425
- Rhino Tools: 142,679
- LLBLgen : > 300K
number of logical LOC is the only context free valid metric, meaning that it doesn’t depend on language or coding style or amount of comment and doc. You can
expect a 5 to 10 decreasing ratio between these numbers and the corresponding logical LOC. NDepend measures 53K logical LOC
for the code base of NDepend and I consider it as a challenging and big project.
I recently consulted
for a 4M LOC project. I have heard thinks like,
- Our code base is so
big that it cannot take less than several hours to be compiled on powerfull servers,
- The .NET framework
code base is a small project compare to our project (I consider that the .NET
framework has around 1M logical LOC if you include all WPF, WCF… stuff. I got this value because it measures around 6M IL instructions)
NDepend measured 700K logical LOC for the ‘4M LOC’ project. Saying that you are developing
a giant code base is always a good thing for your ego and to get more credit, more budget and
more reasons to be excused when something go wrong. This is unfortunate, as in every other engineering professions software needs some professional metrics. And even if LOC is not the right metric for quality, complexity or productivity, it is a good metric to estimate development cost and to compare projects size (as I explained in this post).
diseconomy of scale phenomenon
of big things proprotionnally harder to maintain than small things is known as: Diseconomy of Scale.
This is a phenomenon that explains why it can take a year to add a tiny feature
on a large project such as Vista (70M LOC). The maintenance cost curve is simply not
linear from the code base size, it tends to be polynomial or even exponential. Hopefully
things are much better if you have some well structured and well tested code. I said
hopefully because every piece of software needs new features to survive, and
new features mean more code. I couldn’t agree more with what Ayende wrote:
Features means code, no way around it. If you state that code size is your problem, you also state that you cannot meet the features that the customer will eventually want.