Removing the "Legacy" from your Code

In a conversation last week I was asked for my recommendations on how to retrofit automated testing and build processes into an existing system.  I'm not going to dissemble at all, it's hard.  Moreover, I think rescuing existing code from the brink of a rewrite while still producing new features is the most challenging thing I've ever had to do – which kind of makes it fun in a way.  The 20 months I spent at my previous employer was often spent trying to recover a very poor application architecture and turn it into a productive environment (we made serious strides, but the office evaporated too early).  

I revisited my previous posts on dealing with legacy code, and I still feel like they largely stand on their own.  What I don't have anywhere is a high level gameplan for how I would get started, so here it is with links for more content.

What is Legacy Code? 

In Lessons Learned for Dealing with Legacy Code I stated my definition of Legacy Code:

In his [Working Effectively with Legacy Code] book, Michael Feathers defines legacy code as code without automated tests.  I do think that's valid, but I'm going to broaden my personal definition a little bit.  Legacy code is code that you're afraid of, but is too valuable or big to toss away. 

To summarize, Legacy Code is code that is difficult, inefficient, or risky to change, but too important and useful to throw away or ignore.  You have to deal with it.  You can ignore it and keep going, but it might be a lot smarter to pay down the Technical Debt that you've accrued to remove friction in your development environment.


  1. Where does it hurt? – The most important thing is to do is to target the specific areas of code that are causing you the most troubles or just plain inefficiency.  You've only got so many people and so much time at your disposal, so every single thing you do has to add value.  If a module of code doesn't need to change in the near future, leave it alone.  Where are your performance problems?  What areas of the code change most often?  That's where you go first.
  2. Read everything that Michael Feathers has ever written, period.  Start here.  Mr. Feathers has a bust on my Mt. Rushmore of Software Development.
  3. Make it Build – I think the most bang for your buck is to invest in improved build automation first.  A lot of the problems from legacy code I've dealt with has been difficulty in getting the system environment configured correctly before I could even start.  I've seen other applications nearly crash and burn just because it takes too long to migrate code between environments.  Dose your legacy code with some NAnt (or Ant or Rake or Maven, etc.).  Add some environment tests to your build to make troubleshooting the environment easier.
  4. Start with the End in Mind – You won't get to a desired endstate in one leap forward, but you still need to create and constantly hone your vision for the structure you want your legacy code to evolve into.  Start an Idea Wall somewhere on your Wiki or a visible chart to record ideas from the team about possible improvements.  You never know when an opportunity to make one of these changes will present itself.  Be ready and have ideas queued up.
  5. Management Visibility – You will not be able to very much in the dark of the night.  At some point your actions to improve the existing code is going to have to be visible to management.  Any large scale effort to improve the quality of existing code can only succeed with the full blessing of management.  We wrestled with this problem quite a bit at my previous employer.  I wrote an essay about Balancing Technical Improvements versus New Business Features that included the topic of selling technical improvements to your management as necessary precursors to new business functionality.  All I can say is Good Luck.  If you do run into management that just doesn't seem to care about technical quality, you might want to change your organization.
  6. Characterization Tests – The first automated tests you should probably write are characterization tests.  These tests are generally very coarse grained and work by testing the system from the outside in.  It's basically recording tests.  You really want a good set of characterization tests as a safety net first before you start making structural changes in the code.  In Lessons Learned for Dealing with Legacy Code I recounted one of team's experiences with characterization tests with a few notes of caution.  To summarize, watch the effort to reward ratio of your characterization tests,e and try really hard to make those tests human readable to act as documentation.  As far as a long term safety net for regression testing, you're still going to want granular unit tests.  Big tests tell you something is wrong.  Little tests should tell you exactly where something is wrong.  More in a Taxonomy of Tests.
  7. Cut new Seams – Generally the biggest problem I've seen in retrofitting automated tests into legacy code is tight coupling.  One of the best things you can do is to cut new seams into the code to allow for more isolated unit testing.
  8. Hippocratic Oath – Take the attitude that every time you go into the legacy code to make new changes you will not leave it any worse than it already was.  When the hood goes up on an area of code, see if you can quickly slip in some refactorings in to remove complexity, improve readability, or retrofit some better test coverage. 
  9. Be Opportunistic – Sometimes the best thing to do is just to pick off low hanging fruit.  Little improvements in readability or reductions in duplication add up into big gains over time.
  10. Be Patient – It's going to take awhile.  Keep your eye on the ball.


Don't Let Code Become a Legacy

I think the best path, the most economical path, is to studiously stomp out Technical Debt in existing code as you make changes.  Last year in My Programming Manifesto, Michael Lang left a longish comment I always meant to respond to:

Agile proponents will say “refactor, refactor refactor”.  But I think every project reaches a tipping point where major refactoring just can not be justified in business terms, either blowing the budget or causing delivery delays that lead to missed business opportunities.  At that stage to some extent you’re pretty much stuck with what you’ve got.  At that point if you can’t get to the finish line with the architecture you have without major refactoring, your project, even with heroes and miracle workers, is heading for a melt down.

The rejoinder I would fire back at Michael is that the "tipping point" he refers to is largely brought on by putting off refactorings or design corrections for too long.  The more quickly you recognize technical problems in your code, design, or architecture, the easier it is to fix these problems.  I don't care how much UML or how many CRC cards you did upfront, you should still do reflective design as you work.  Constant small refactorings improve efficiency.  Waiting too long and making a refactoring expensive is inefficient.  In other words, don't allow technical debt to build up, and avoid Michael's "Tipping Point."  The interest rates from Technical Debt are a killer. 

By the way Michael, you still do design on Agile projects — and most of the worst architectures I've ever seen have all been the result of elaborate designs done completely upfront and then executed without deviation from said design until it was far too late.


Other stuff

Another note of interest, some code is just too nasty to recover.  Fullblown rewrites are fraught with peril.  Take a look at Fowler's idea of a StranglerApplication to find a way to rewrite selective portions of an existing system.  Also, check out Brian Marick on Approaches to Legacy Code.

Alberto Savoia has an interesting series going on Characterization Tests.

About Jeremy Miller

Jeremy is the Chief Software Architect at Dovetail Software, the coolest ISV in Austin. Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy is the author of the open source StructureMap tool for Dependency Injection with .Net, StoryTeller for supercharged acceptance testing in .Net, and one of the principal developers behind FubuMVC. Jeremy's thoughts on all things software can be found at The Shade Tree Developer at
This entry was posted in Continuous Integration, Legacy Code. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Michael Lang

    Hmm, did someone mention my name?

    I think at the time I made the “tipping point” statement,
    I was working in a development culture in which UML was considered to be a complete waste of time. They thought using agile methods eliminated the need for UML. I was still using UML but almost in a clandestine way, as sharing it with anyone wasn’t received in a positive way; which was a great shame as UML is #1 a design tool, and #2 a means of communicating systems design. In such an environment UML is only half as useful to me as it might otherwise be.

    I think UML modelling is like anything else, there’s a multitude of approaches and ideologies around it’s use. I don’t think I’ve ever had an experience in a project utilising UML where there wasn’t a need at some stage to go back to class and sequence diagrams and re-factor. Or in other words to adjust for oversights in the systems design.

    When I use UML, generally I don’t attempt to capture the entire system up front. The model is high level to begin with and then I add detail as it becomes necessary throughout the SDLC. The code is being written in combination with the development of the UML model.

    So basically I think we’re both on the same team. I’m saying agile methodologies shouldn’t exclude the use of UML, which I think is exactly what I think you’re saying.

    When dealing with legacy code bases, I think UML can help give you the big picture needed to help you make the right changes. With UML from my experience you end up re-factoring less, and when you do re-factor with UML you can determine the best way to introduce fundamental new constructs into a systems design whilst maximising the value of your existing code base.

    With UML CASE tools you can reverse engineer class diagrams from an existing code base. More recently (much to my own excitement) with Sparx Systems Enterprise Architect application you can connect EA to a debugger and record a sequence of method calls to create sequence diagrams.

  • Steve


    It’s a good idea to leave procedural tasks as procedural tasks. Static methods to access the database are much preferred to object based ones for simple database tasks (executing a stored procedure, executing a query and returning all rows in a data structure, etc).

    We gaugue not only the testability of code but the maintanability of it as well. This means we’d ask questions like: ‘How hard would it be for a developer unfamiliar with the project to add an additional parameter to the stored procedure?’ We’d try to minimize the development effort needed, mimimize the code changes needed, and keep those code changes self-contained in a single method if possible. This would help us verify that the change worked, is testable and did not break anything else.

  • programmer grrl

    Thanks for the great post. I’ve worked with a lot of legacy code lately, and everything you say rings true. It’s easy to get discouraged, but it helps to see you’re not alone, and to get honest, practical advice like this. And I agree, the Feathers’ book is an absolute must-have.

  • stevedonie

    Here’s a situation we are dealing with right now – I would welcome suggestions. We have a large legacy code base. We are trying to improve it and add more tests as we go. All new features are being written test-first, but we frequently run into problems getting the system under test. I’m still working my way through the Feathers book, and I suppose it is in there somewhere, but perhaps someone here could give me a shortcut – how does one insert a seam where you have many DAO classes that open a connection to a database in their constructors? We also have many classes that have static methods that will directly access the database – how do you convert procedural code like this?

  • James Taylor

    Great post – love the definition (I shall use it someday). I would only add that re-writing the high-change components using a technology, like business rules, that makes maintenance easier (or even allows business users to make some controlled changes for themselves) is a great way to reduce the threat of your legacy. As a Gartner analyst once said “we have established that you can’t code your way into the future”.

  • Steve

    Legacy code usually exhibits common characteristics:
    1. Difficult to build or get to run
    2. Reliance on multiple third party libraries, frameworks, etc.
    3. Reliance on external build tools
    4. Uses custom build scripts
    5. Build or execution environment hard coded to a particular machine or particular version of some tool (build tool, test tool, etc) or worse, a particular directory.
    6. Reliance on trendy system architecture, patterns, and/or coding styles.

    Generalizing, the common factors that increase risk are:
    – Extra dependencies greatly increase the cost of the system
    – Coupling together distinct parts of software development greatly increase the cost

    This is one of the largest cost factors given that 3/4ths or more of the cost of a production piece of software is after it is put into production.

    I’ve seen these on everything from more recent systems in Java/Net.2.0 backwards to win32, dos, vax and unix.

    I’ve found that the better systems generally can be easily modified/supported/enhanced by someone with about 2 years of experience out of college. We try to develop systems at that level with minimal integration/coupling of tools/libraries/frameworks in order to minimize the total cost of the system over its expected lifetime. This is done to reduce the risk of os, api, .net framework, etc upgrades breaking our system.

  • Alberto Savoia

    Hi Jeremy,

    Great blog. It looks like we are both big fans of Michael Feathers’ work and crazy enough to tackle the issue of legacy code :-).

    Thank you for mentioning my own little blogs on working with characterization tests. There is so much material on XP, TDD and other “sexier” techniques, but most developers are stuck working with legacy code and more talk and discussion on characterization testing is needed.


  • Andres Taylor

    I really like this list. Very valuable.

    The only thing I would add to it is to use a tool like ndepend or Lattix to visualize the state of the current architecture. It’s great to have a common view to discuss from.

  • rlewallen

    Jeremy, the static code analysis tools for .Net framework already exist. NDepend is the best (IMO) and popular with 65 unique code metrics. NStatic is going to be released soon. NDepend does not plug into VS, but its super easy to use. Not sure about NStatic plug-in capabilities. VS Orcas has some simple static analysis that produces five different metrics; Maintainability Index, Cyclomatic Complexity, Depth of Inheritance, Class Coupling, and Lines of Code.

  • jmiller

    Missed one —

    * Use static code analysis tools to help you spot gnarly problems in your code. Static analysis can generally spot classes or modules that are too complex and find severe coupling problems. It’s a good “bang for the buck” exercise. You can already do it in IntelliJ, and I think there are tools coming to do it in VS proper.