The blogosphere is full of discussions and arguments on the best way to write
and design software. It might be worth the effort to stop and go back to
first causes — just what quality or qualities do we want in our code?
What are we trying to achieve? For me as a developer on enterprise software systems, that
answer is easy. As far as I’m concerned, maintainability is the single most important quality of
code. You might be tempted to say productivity, but since most of our time
is spent modifying or extending existing code, that
productivity is predicated upon the maintainability of the code. Productivity over any extended duration, even within the initial
project, can only be ensured by creating a maintainable codebase.
A good software design minimizes the time required to create, modify, and maintain the software while achieving acceptable run-time performance.
Enterprise software systems change. Business rules change, technology
platforms change, third party dependencies are upgraded. Again from Jim
Shore, “…most software spends far more time in maintenance than in initial
development.” Enterprise systems typically aren’t replaced because they
stop working. The end of life cycle for an application or system is often
brought about because the system has become too difficult, risky, or expensive
to modify to keep up with evolving needs.
Maintainability has become a near obsession for me because I’ve spent much of the last two years modifying, extending, or flat out
legacy code. Inevitably, much of the code has proved to be difficult
to to work with. Our efficiency has been hampered on multiple occasions
because of bad existing code and poor or nonexistent development infrastructure.
We’ve been noticeably faster when we’re working with the newer code written with
TDD, NAnt builds, and FitNesse tests. Maybe more revealing, we’ve become
much more efficient with our legacy code when we retrofitted a lot of test and
build automation around the code.
Finally, one last quote from Mr. Shore:
“…the goodness of a design is inversely proportional to the cost of
I agree, and this “manifesto” was originally going to be completely about coding and designing
software for maintainability. I still think that the quality of the code is the single
factor in the longevity of an application, but in reflection, some of the biggest gains my team
has made with our legacy systems has been the creation of more comprehensive build
automation, faster builds, better configuration management, and a body of
automated tests. It’s not just the code, it’s the entire ecosystem.
Personally I think that continuous, adaptive design is the most reliable
mechanism to arriving at a good design, but then again, continuous design is
most easily accomplished when there is a high degree of feedback for any changes
like automated builds and tests. So for the moment being, here is my
vision for the practices and infrastructure you need around the code itself to
create a maintainable software ecosystem.
Answer these Questions with a Yes
If you want to create a maintainable code environment, you’ll need to be able
to answer all of these questions in the affirmative.
- Can I find the code related to the problem or the requested change?
- Can I understand the code?
- Is it easy to change the code?
- Can I quickly verify the changes? Preferably in isolation.
- Can I make the change with a low risk of breaking existing features?
- If I do break something, is it easy to detect and diagnose the problem?
Unsurprisingly, my answers to these questions largely come from
Agile/XP practices — Test Driven Development, Refactoring, Continuous
Integration and Acceptance Testing. To answer yes to all six questions, I say you need very solid, clean, well-structured code and multiple layers of
effective feedback to let you know when things are wrong and what exactly is
wrong to quickly correct problems. Agile development is nothing but a set
of practices to maximize feedback. Maintaining a high quality of code
draws on much older practices and values, but I think that Test Driven Development
and designing for testability are the single most effective mechanism to enforce
some of the traditional definitions of good code structure that enable easier
modification of code: separation of concerns, high cohesion, and low
Journey to a Maintainable Software Ecosystem
As a developer extending an existing system you’re often the
protagonist in the the old Myst
computer game. In Myst you’re a traveler exploring a strange world where
all of the people have disappeared and you solve a series of obscure puzzles to continue along
your journey. Now, let’s take a tour through my vision of a maintainable
software ecosystem. Say you’re a developer tasked with making some
extensions to an existing codebase, and the original developers have all
disappeared. Unlike Myst, in this healthy ecosystem there should be a sign
or signs at every point to say “go here next” or “you have a problem right
I start by finding a brief document or Wiki page that tells me what software
has to be installed for the code (IIS, Sql Server/Oracle, etc.) and the all
important location of the source control repository. When I retrieve the
root of the source repository I also get a copy of everything else that the code
needs to execute and the master build script. I then run that build script
that compiles the code, sets up all of the necessary environment configuration,
and runs a set of unit and integration tests. As soon as I see this build
finish successfully, I’m reasonably assured that my box is able to execute the
code. Once I have the code opened on my box, I can see that the code is
well factored and largely orthogonal in structure. I’m able to find the
place where my new code should go and the patterns that the existing code
follows so I can maintain some consistency. Because the code is loosely
coupled, I can add my new code and easily unit test it in isolation without
having to deal with much of the existing code. Once the new code is
complete I run the build script again, and assuming the build script succeeds, I
check the new code back into source control. A continuous integration
server detects the changes, builds the latest code on the clean build
environment, runs a more exhaustive suite of automated tests, and finally
creates a deployment package that can be used to push the code to a testing
environment. I can confidently push the code quickly into production
because there is a near comprehensive suite of automated regression tests that
largely reduce the cost and risk of regression testing. The risk of
propagating code is minimized by a self-diagnostic deployment strategy that can
tell you what and when something is deployed incorrectly.
The previous paragraph doesn’t have to be just an imaginary place. It’s
obviously easier to accomplish with “Greenfield” code, but it might be more
important to get that existing, strategically important codebase to
Invest in Continuous Integration Infrastructure
Of all the practices in Agile development, the one practice that I would
recommend without reservation is
Continuous Integration. If you’re a team brand new to Agile or XP
development and looking for a place to get started, I say start with CI (with
TDD an immediate second). I
recently read an article that aptly describes Continuous Integration as having
conversation with the system. If you’re not already familiar with CI, it’s
the practice of running a full integration build on the most current version,
usually including environment setup and unit tests, on every single check-in to
the source control repository. The Continuous Integration infrastructure
is most effective when it’s coupled with a developer attitude of frequent
check-ins. A lot of
teams approach an automated build script as overhead, a nice thing to do, but
one that can be skipped in a time crunch. Not so. A comprehensive
automated build script is strategic to maintaining
productivity over the lifecycle of a system. It’s an investment, not a
A good CI infrastructure and process can reduce friction in working with a codebase by:
- Faster feedback from any changes made to the system
- Providing better transparency into the changes happening to the system
- Propagating environmental changes and code changes more rapidly while
- Ease integration issues by dealing with them earlier in smaller chunks
A vexing problem in maintaining an existing codebase is not being able to
exercise the code in your immediate development environment. After all,
how can you really know that the code works if you can’t run it?
Enterprise applications almost always have dependencies on external libraries
and specific server configuration. Too many times I’ve seen developers
stopped in their tracks because of issues with their development environment.
In the past, I’ve spent up to a couple of weeks just trying to get an existing codebase to
function on my workstation before I could begin writing new code. That
time is inefficiency, and a preventable waste of time. Even with a build
script I’ve seen developers spend days trying to work out the kinks in their
system to make the build script function.
The best answer in my playbook is a completely comprehensive automated build
script backed up with a modicum of documentation. Obviously, you’re not going to install Sql Server
or Apache from the build script. Other than big ticket items, the build
script should completely lay down all of the environmental dependencies.
Our build script will build a local copy of the database, setup virtual
directories on IIS, register COM (must die) dependencies, installs the windows
services, and make all of the relevant registry entries necessary to execute the
code. Theoretically, we could bring a brand new developer in and get the
entire suite of applications running on the new workstation in a couple hours
(if we were ever allowed to hire again anyway). Even with an unchanging
team that’s important because the application itself is always changing.
Every new project we build adds new environmental changes to the system.
Keeping all of the setup in the automated builds helps to get these changes
propagated to the other developer workstations.
Here’s a scenario that’s unfortunately common in the .Net world (I have no . A
developer or a pair builds a new feature that depends on a third party library.
The third party library comes with an MSI that puts the assemblies in the GAC.
The developers continue on with the development on their workstations and
everything works perfectly — until that code is moved to a different box.
If the team uses a Continuous Integration [link] strategy, that problem is going
to be spotted immediately. If the team is being diligent about their
Continuous Integration strategy they’ll automatically add the new environment
setup to the source control tree and build scripts. Even if they don’t,
the CI build is going to give them immediate feedback that there is environment
One of the most pernicious velocity killers is friction or uncertainty with
migrating code between development to testing and production servers. I’ve
seen shops try to beat the issue with lots of ceremony and paperwork, but it’s
an inefficient and ineffective mechanism by itself. I watched one team
take up to a week to setup a testing environment for a particular branch of the
code — and they had to do this a dozen times a year. We’ve beat the issue
by automating everything that moves. We’ve extended our Continuous
Integration infrastructure to include moving successful builds to our testing
servers on demand. We built
environment testing into our code to quickly troubleshoot and detect
problems with an installation of the code. We have not had a single
problem with a testing deployment since. Speeding up the feedback cycle between development and
testing has certainly helped us, but the improved reliability and control over
the testing migrations has made a tremendous difference. The end result
for us is the ability to quickly shift code from development to testing, know
that the installation is valid, and all the while have accurate traceability
from the build products being tested to the exact version of the source code.
It’s a great balance of speed versus control with relatively little developer
overhead once you’re past the initial creation of the build infrastructure.
Take a look at the
Capistrano project from the Ruby world. Envision a world where you can
reliably do production code pushes and rollbacks with a single mouse click.
If you had that ability, and some shops already do, how much faster could you
deliver new features and fixes? How much more incrementally and iteratively
could you work (think Google)? If a production push is scary, or takes an
act of congress to move through your process, your system isn’t going to be that easy to maintain
— even if the code is pristine. Someday
I’d like to have the one click production push.
Getting the Source Code Under Control
You have to be able to find the right code, and preferably without spelunking
through an ancient VSS repository. There needs to be a
single, authoritative source for the code and its dependencies.
It should go without saying that using source control software is nearly
mandatory in any professional software endeavor. My colleague and I have
between us given four presentations on Continuous Integration in the last 18
months. After every single presentation somebody approached us with a horror story of a
multi-developer team working today without any source control. That’s
borderline insane, but just using source control may not be enough. One of
our mission critical subsystems has its source code scattered across a couple
different repositories, none of which can be said to be the authoritative master
repository. It’s a major source of concern. A
sadly common anti-pattern in software development is creating build products
directly on a developer workstation and migrating those products. At this
point it doesn’t matter if the compiled build products themselves are checked
into source control because they aren’t traceable. We did this routinely
at one of my previous jobs. Two years after I left I got a call from a
friend of mine who had checked out a VB6 project I’d written to fix a production issue. It wouldn’t compile because a class file was missing. Oops.
Again going back to first causes, what I want to achieve in my software
ecosystem is the ability to accurately trace at all times the build products
installed on testing, development, and production environments to the exact
versions of the code. It’s awfully hard to diagnose problems in production if you’re not sure which version of the code you should be looking at. In the example above we had a formal process
recording production and testing migrations, but no real traceability from the
installed binaries to the exact source code. Good traceability doesn’t
have to be difficult. For us, traceability is almost a byproduct of using
Continuous Integration. The first step is to simply get all of the code
and anything that’s necessary for the code to function into the source control
repository. Continuous Integration is only effective if there is a single,
authoritative repository for the source. Part of our CI build is embedding the
CruiseControl.Net build number into all of the .Net assemblies. It’s just
task that looks like this from my StructureMap build file:
At the end of each successful CI build (compile, unit tests, integration
tests), CruiseControl.Net creates
tags the source control repository
with the CruiseControl.Net build version. Only successful, versioned
CruiseControl.Net builds are ever migrated to testing or production. The key point is that we can pull a version number off of one
of the assemblies on the test server and immediately find the exact version of
the code from the source control repository.
Don’t forget about the database, either. Database changes have a bad
tendency to fall through the cracks. Treat the database schema as just
part of the code. It’s a stupid, unnecessary risk to put database code
through a completely different change management process than the code. It
almost guarantees that the code will not be synchronized with the version of the
The idea of a DBA being able to push stored procedure code directly to
production needs to be abandoned. The database schema scripts need to be
completely under source control and part of the automated builds. Changes
procedure code or DDL scripts should not move to production until it’s been
through a successful integration run of the CI infrastructure. Again, back
first causes. You want reliable traceability between the version of the
middle tier and the version of the database schema. Continuous Integration with the database can be tricky. I’m more than a little intrigued by the
Ruby on Rails database migrations,
even for non-Ruby development.
Automated Unit Tests
My experience is that
good unit tests help maintainability immensely. On the other
hand, unit tests that are brittle, hard to understand, and too tightly coupled
with the implementation may only make things worse. If you’re afraid to
change the code because too many unit tests will break, you’ve got some serious
problems in either your tests or the code structure (brittle unit tests are a
code smell). Writing good unit tests is a very large topic in and of
itself, but suffice it to say that it behooves you to spend some time learning
more about writing good unit tests. I suspect that a lot of the failure
stories we see from people trying TDD result from not understanding how to write
good unit tests.
- Providing a solid safety net of regression testing to enable
refactoring. While I think refactoring is necessary to arrive at a
good design in any situation, refactoring is an absolute necessity as the
function of an application evolves. For instance, in my current
project I needed to reuse some large pieces of functionality from the
application in a completely different context. The first thing we did
was to refactor the code so that we could call the smaller pieces of
functionality without the application workflow as a whole. The only
reason we were able to do this refactoring safely was a series of FIT style
tests we had written as regression tests. We made a series of small
changes and ran the test suite after each change, occasionally backing up to
reverse a code change when a test failed.
- Creating a specification for the usage of each class with readable examples
of the API. If the test is readable, it should act as documentation for
the code that it exercises. I often refer to unit tests to see how to use
class that I didn’t write. We rely very heavily on a multitude of open
source tools that are notorious for a lack of documentation. In several
cases I’ve been able to pop open the code and read the NUnit tests to discover
how to use a feature. Unlike an external document or even NDoc style comments, the
best thing about making unit tests act as documentation is that the unit tests
cannot diverge from the code without failing.
Both the specification and regression safety net qualities of TDD are
maximized by creating fine grained tests that are easily understandable.
When we inherited our legacy application last fall all that came with it
were a series of coarse grained integration tests that would fail without
any useful failure messages. It was almost impossible to troubleshoot
the tests without putting a debugger on the code and following it from end
to end. Those tests did not aid in refactoring because they didn’t
really diagnose a problem, only report that there was a problem. Over
time we’ve moved to FIT style tests that exercise smaller pieces of
functionality at a time that are easier to control. These tests have
been far better as a safety net because they can give us much more context
around the exact reason for the failure. In our newer code we’ve
written TDD style from the beginning a failed unit test will point to a very
small area of the code, making the diagnosis for the cause of the failure
I haven’t internalized
it completely, but I definitely like where the
Development (BDD) advocates are going. Even if BDD leads to nothing
but writing unit tests with cleaner syntax I would call it a success. The slight shift in semantics from “Test-” to “Behavior-” is
important. I think we will be better off when the emphasis is more on
creating an executable specification of the expected behavior in the small
versus “at some point I need a unit test for each method on each class.” TDD/BDD is supposed to be an exercise to define what the code
is supposed to do and then ensure that the code does lead to the expected results.
Executable Requirements for Less Expensive Regression Testing
Code isn’t useful unless it does what it is meant to do and continues to do
what it is meant to do. Assuming that you actually have the correct
requirements from the business, what’s the best and most efficient way of
verifying the code against the requirements, now and later? After all,
regression testing is one of the most expensive items in software maintenance.
One answer is to implement the detailed requirements as automated tests.
The obvious benefit is that running the tests ensures, or at least detects, that
the code still fulfills the requirements. Automated tests as a requirement
document also has a significant advantage in that it reduces duplication between
a requirements document and the testing plan. Instead of keeping two sets
of documentation synchronized with each other and the code, you have one source
of information that can be automatically reconciled against the code.
Another huge advantage of specification by automated test is the removal of
ambiguity from the requirements. A test succeeding or failing is a binary
decision, there’s no room for ambiguity the same way there is in fuzzy “the
system shall…” type requirement documents.
Okay, the first couple of objections to automated tests as requirements are
that non-developers won’t be able to understand or write the automated tests.
Not true. Personally,
I’m a big fan of expressing
requirements in FIT style automated tests. You can write FIT tests
that are human readable by non-developers (not that developers are non-human,
but…), especially since you can quite happily mix prose with the test tables.
FIT tests used to be limited to table driven tests, but with the addition of the
“flow” style test fixtures in the
FitLibrary you can effectively write automated tests in English sentences.
Test automation isn’t going to be a silver bullet, but it goes a long, long
way to enabling change in software systems — especially when the tests are run
automatically as part of your Continuous Integration tests. If you can
catch regression bugs by the automated tests almost immediately upon checking in
the code, you can usually fix them faster. I definitely think that you can turn bug fixes around much faster when you can take care of things completely
on your own workstation without having to go through the formal bug workflow.
Test automation is especially effective when the developers can execute the
tests on their own workstations. That can cut the feedback cycle down
Badly written automated tests
can even cause more effort and trouble than gain. The same qualities of a
good unit test apply just as much, if not more so, to acceptance testing.
Good automated testing does not automatically equivocate to FIT either.
The key point is to create tests that are easy to understand, reviewed, and
hopefully written by the business experts. Ruby or Python scripting seems
to be another alternative for testers to
readable, automated tests.
How about Documentation?
I haven’t mentioned much about the type of system level documentation
that needs to exist. To be honest, if you engage me in a conversation
about how to make a software ecosystem maintainable I would probably forget to
even talk about documentation. We’ve all heard the mantra “the code is the
documentation”, and I actually believe that, but with some additions.
Ideally, I think that comprehensive “documentation” for a codebase is this troika:
- Intention revealing code
- Solid automated test coverage
- A complete automated build script.
As I mentioned earlier, the automated build scripts should be able to set up
a clean development environment to run the application. If that is really
true, then the build automation script is the single most authoritative source
of information about the required environmental setup for the application or
system. Even better yet, the build automation script can’t diverge from
reality if its being run constantly. The same thing applies to automated
So, back to the question of what documentation do you need? I say just
enough to fill in the gaps between the code, the build script, and the tests.
The big danger of documentation is the risk or overhead of keeping the
documentation synchronized with the current state of the code. If
documentation simply duplicates information that could be gleaned from the code,
I don’t think it’s worth writing. I fall into the camp that says the
overhead cost doesn’t justify the effort of comprehensive documentation.
More succinctly, I put a much higher priority on readable code, readable
automated tests, and solid build automation than I do on documentation.