Jeremy’s Fourth Law of Test Driven Development: Keep Your Tail Short

One of the first ongoing lesson for making Test Driven Development succeed for you is to learn how to write isolated tests with less effort.  My Fourth Law of TDD is all about recognizing the need to do testing piecewise as a vital part of formulating software designs.  One way or another, all design concepts come down to dividing the whole into granular pieces.  Keeping a short tail is yet one more design cue to help guide you where you want the design to go.


To preclude any confusion, this post is in no way related to the Long Tail from Chris Anderson, it’s only an unfortunate choice of name from me when I originally jotted down my “Laws of TDD” a couple years ago.  Before you ask about the “Fourth,” here are links to the previous three “Jeremy’s Laws of TDD.” 



  1. Isolate the Ugly Stuff (Oct 2005)
  2. Push, Don’t Pull (Mar 2006)
  3. Test Small before Testing Big (May 2006) 
  4. Keep Your Tail Short (this post, Apr 2007 – wow, did I get sidetracked)

5-10 and the Zeroeth law will follow — someday.


I’m doing a presentation at DevTeach 2007 on Agile Design that has largely evolved into a code-centric discussion on performing design continuously.  A vital part of making the continuous design philosophy effective is to structure code so that it can easily accept change — i.e. a heavy focus on the design principles that lead to Orthogonal Code.  Hand in hand with orthogonality for me is the usage of Test Driven Development (or Behavior Driven Development) to drive the design of the code in an evolutionary fashion.  TDD brings some important things to the continuous design table like a revved up feedback cycle and an almost mandatory focus on building a system through the systematic creation of small working pieces of code.


I would go so far as to say that TDD is the single most important tool for doing design in the small (it shouldn’t be your only design tool, but that’s a very different post).  That being said, my own experience includes a couple projects where TDD either failed or didn’t really wring out all of the advantages that it should.  Looking back at those projects with 20/20 hindsight, it’s easy to make a diagnosis:  TDD was just too hard because far too much of the code had a long tail of dependencies that just couldn’t be isolated.  If you wanted to test business logic, you had to pull in the database, push around configuration files, and often fire up the user interface for good measure.  TDD just flat out grinds to a halt if you have to tackle too many issues at one time. 


Looking back at my previous writings on Test Driven Development and designing for maintainability there’s an obvious underlying theme poking out from almost all of the “best practice” design concepts I’ve written about – strive to do one thing, and only one thing, at a time in your code.  I want to work on one thing at a time because the human mind is finite in its ability to process multiple threads of logic.  I want to pick off some parts of the system that I understand and start coding and delivering value now instead of getting mired in “Analysis Paralysis” trying to see the whole picture.  I want to shorten the feedback cycle between coding and testing to work faster.  I want the ability to test cohesive areas of the code in isolation from other areas of the code to simplify testing.  And oh yeah, if I’m going to design and build things one thing at a time, I’d like to be able to put the pieces together at the end of the day and actually have them work together.


If you can do one thing at a time in your code the Test Driven Development effort will go much smoother.  One of the first steps to doing one thing at a time is follow Jeremy’s Fourth Law of TDD:  Keep Your Tail Short.  The best description for this general concept I’ve ever heard comes from Stuart Holloway – “When you pull a class off the shelf, what else is coming with it?.”  Think of it this way, if I want to test my business rules, or my display logic for that matter, what other areas of the code do I have to pull as well?  By and large you want to answer that question with “not much.”  Writing test code just to placate my database or web server when I’m testing business logic is purely overhead — both in terms of the mechanics of creating the test and the intellectual overhead of understanding the test.


I’m going to use several examples in this post that violate my Fourth Law to demonstrate why and how keeping a “short tail” is a more effective way to develop software.  These examples might seem contrived, but every single case is something I’ve worked on or caused.  Some of the worst examples are my recollections from my first, painful project using TDD when we did not keep our tail sufficiently short. 


My First Foray into TDD Bombed


My obsession with creating easily testable code dates back to my first official project using TDD.  Unfortunately the whole team was relatively green in regards to Agile engineering practices, and we made the typical mistakes that TDD newbies do by binding the UI and business logic too tightly to the infrastructure.  Specifically, I’m thinking about a workflow feature that I created.  It was a fairly simple state machine in terms of coding, but the automated testing was a different story.  I don’t remember the exact details (repressed memory?), but you had to make the workflow actions in a perfectly sequential order.  There wasn’t anyway just to create the state machine object to directly go to any arbitrary state.  You had to run the state transitions one at a time, and each state transition made calls to stored procedures.  I also had it set up so that the workflow read and wrote directly to the database.  The end result was the tests could only run against the database and every unit test basically had to include almost every action of the workflow.  The tests were difficult to write, almost impossible to debug, and very brittle since every unit test really depended on every other test.  My workflow logic had a long tail that reached all the way into the database schema — and to add insult to injury we had a single shared Oracle schema for all development and testing.


How would I do that today to drive the workflow through testing?  Actually, I think it would be pretty easy.  Move all of the persistence out into a Database Mapper, or just use NHibernate, so the core workflow logic doesn’t even know the database exists.  Isolate the state machine code into a smaller class that basically only knows how to change its own state as a result of workflow state transitions, and maybe direct other services to perform actions.  I would tie the whole thing with some sort of Controller class that directs both the state machine class and the persistence service.  The Controller would be tested with mock objects in place of the persistence and the actual workflow state machine logic.  A bit like this:



    public interface INotificationService


    {


        void SendEmail(EmailMessage message);


    }


 


    public class StateMachine


    {


        private Status _status;


 


        // The real constructor could completely set up the StateMachine


        // to any point in the workflow


        public StateMachine(Status status)


        {


            _status = status;


        }


 


        public void CreateNew(INotificationService service, string assignedTo, string description)


        {


        }


 


        public void Approve()


        {


            _status = Status.Approved;


        }


 


        public void Reject(INotificationService service)


        {


        }


    }


 


    // The Controller class that just coordinates the services and


    // StateMachine objects


    public class StateMachineController


    {


        private readonly IStateMachineMapper _mapper;


 


        public StateMachineController(IStateMachineMapper mapper)


        {


            _mapper = mapper;


        }


 


        public void Approve(int issueId)


        {


            StateMachine machine = _mapper.Find(issueId);


            machine.Approve();


            _mapper.Save(machine);


        }


    }


My goal is to make the StateMachine easy to test by making it easy to setup and completely decoupled from the database.  Most of the heavy duty logic is in StateMachine, so I would focus on isolating that logic and use StateMachineController to handle the coordination with other services.  In my analogy below, StateMachine is Aerosmith and StateMachineController is the roadie for StateMachine.


Aerosmith and Roadies


The one system structure that will never, ever deliver either testability or maintainability is the Big Ball of Mud.  In software design, it’s imperative to define and assign the various responsibilities of your system to discrete pieces of the system.  I’m a huge fan of Responsibility Driven Design (RDD) from Rebecca Wirfs-Brock.  A key concept in Responsibility Driven Design is to think of classes in terms of stereotypes.  The stereotype of a class says a lot about the role and responsibility of a class within the greater system.  We can use the idea of class stereotypes to quickly break down a complex feature into smaller pieces and even give us some immediate ideas for minimizing the tail of dependencies between the smaller pieces.


My own little spin on class stereotypes is what I call the “Aerosmith and Roadies” analogy.  Say you’re Wozniak and you feel like putting on your own rock concert, what are the things that have to get done, and who does it?  The first responsibility is easy, Aerosmith plays the concert.  That’s a great start, but it’s a good bet that Aerosmith isn’t going to set up the stage, plug in the instruments, get the green M&M’s, and generally make sure that Aerosmith has everything it needs.  You’re going to need the roadies and other people to setup the stage for Aerosmith so Aerosmith can concentrate on just playing the music.


I’m working with financial companies now and many projects implement some form of market or trade analysis to support or make trading decisions.  For this case it’s almost imperative to follow the Aerosmith and Roadies division of responsibility. 


The actual analytical code is the obvious rock star.  The only thing the actual analytics code should do is take trade and market data that is handed to it and create knowledge and trends from that data.  The rock star analytics code doesn’t know where it’s data comes from, and it might not even know what’s happening to the information that it creates downstream.  It’s important to limit the responsibility of the analytics engine for a number of reasons.



  • The analytics code is potentially very complex.  It should be significantly easier to code if you can work on only the analytical code without worrying about the data source and downstream dependencies.

  • The analytics code has to be easy to test, and the easiest possible class (or cluster of classes) to test is one that takes in data through it’s API and returns a result without calling out into anything else.  To make this happen, I must not have a long tail of dependencies on other systems, databases, and calls to services within the analytical engine.  In the automated tests, I can just set up the market and trade data in memory, run it through the analytical engine, and check the results again in memory.  It’s critical to make the analytical code testable because it’s making decisions about what to do with someone else’s money.

  • The analytics code will change over time as the trading gurus tweak their trading algorithms.  The underlying data source and the downstream systems probably won’t change at the same time.

The analytical code has to fed it’s data as an input, and something has to actually display or act on the results from the analysis code.  Borrowing terminology from Responsibility Driven Design again, fetching the market and trade data is the responsibility of some sort of Service Provider stereotype.  You would probably also create a separate set of service providers for carrying out the decisions made by the analytics engine after the analysis is complete.  In my Aerosmith concert metaphor, the market data service is like a caterer or a guitar manufacturer, and these people probably don’t deal directly with Aerosmith.  You certainly can’t expect Aerosmith to call up the caterer and make their orders over the phone, someone else is in the middle.  Wozniak needs some middlemen to go to the caterers and the moving trucks and set things up for Aerosmith to play.


The middleman in the concert is the roadie.  The roadie runs around, gets the food and instruments from the truck, then delivers the food to Aerosmith.  The roadie gets the instruments out of the trucks, sets up the stage, and plugs in all the instruments on the stage.  All Aerosmith has to do is walk up to the stage and put on the show.  They don’t have to be distracted by logistics.  The roadie is the Coordinator and/or Controller stereotype from Responsibility Driven Design.  It’s a class that’s only real responsibility is coordinating the actions of other classes.  The roadie is the glue.


So does all of this Aerosmith/Roadie/Class Stereotype exercise do for us in designing software?



  1. Applying the concept of class stereotypes to the larger problem quickly suggests some division of responsibilities within the larger trade analysis subsystem.  In effect, we’ve set ourselves up to “divide and conquer” the code so we can mentally deal with fewer variables in our design at any one time.  We’ve reduced the bigger problem into smaller problems to limit the complexity of any one piece of code.

  2. The analytical code has a short tail and can be tested in isolation.

  3. The data provider is only that, a data provider.  We can test the data provider in isolation from the analytics. 

  4. Because the analytical code has no tight dependency on the data source, we can potentially use the analytics engine in a completely new context.  Today you’re building the analytics engine to work on the trades previously created in your system on the server side.  What if tomorrow you want to provide “what if” calculations on potential trades to a user working in a completely different screen on the client side?  With a short tail of dependencies, the analytics engine is reusable.  With a long tail of dependencies on infrastructure, you’re not going to be able to easily reuse the analytics functionality.

Wait Jeremy, couldn’t I just isolate the analytics engine by using mock objects or stubs for the market and trade data?  Yes, absolutely, but setting up mock expectations is also a dependency.  Anytime you find yourself doing the exact same mock object setup “preamble” in multiple unit tests it’s a design cue that there might be more than one responsibility in the class under test.  Think about splitting up the class to test with less mock object setup. 


Scott Bellware absolutely groans when I use this analogy, and that’s more than enough justification to throw it in ;-)


Can you get there from here?


Here’s an analogy for systems that are hard to test.  We live in Stamford, Connecticut.  My extended family is mostly scattered across the area where Missouri, Arkansas, and Oklahoma come together.  When we visit my relatives there simply isn’t an easy way to fly from one place to the other.  We’ve got to detour through Charlotte or Pittsburgh or Houston to get from one place to another, not to mention the effort of driving to the airport. 


A couple years ago I made a first pass at adding diagnostic functionality to StructureMap to create user friendly messages for common configuration errors.  I created a class hierarchy that represented an instance, all the arguments to its constructor, and its dependencies.  This diagnostic class (InstanceGraph) could only be constructed by passing in a .Net Type and a fully formed object containing the configuration of the instance.  Every time I needed to write another unit test I had to jump through hoops to create a fake Type with the correct constructor arguments, set up the configuration for the instance, then I could finally build the InstanceGraph object and look at the problems that it detected.  It just took way too long to build tests, especially since I was having to do a lot of setup that really didn’t have any semantic meaning to the assertions in the unit tests.  In the end, I realized that the diagnostics could really be modeled as two distinct responsibilities:



  1. Parsing the .Net Type’s and the configuration into a “design time” model
  2. Analyzing the “configuration time” model to look for missing, invalid, or inconsistent configuration

In essence, I cut off the reflection “tail” from my diagnostic code.  I created that new model that had absolutely zero dependencies on System.Reflection and just like that I was able to write unit tests with very little friction.  I could skip the “create fake Type” test setup and go straight to the exact scenario that I wanted to test.  I’d say that I made two improvements:



  1. The unit tests required less mechanical setup work and therefore unit testing was faster
  2. The unit tests were easier to understand because there was less code noise from all of the test data setup.  That’s an important quality because unit tests serve an important secondary role as low level API documentation.

Of course I still had to build code that knew how to use Reflection on .Net Type’s and the StructureMap configuration to build up the design time model, but that code is now relatively simple.


One of the important things I’ve learned is that excessive test setup work is a code smell that points to a possible problem in your design.


 


Test by Measuring What You are Trying to Test!


A couple years ago I inherited a .Net application from another team that had just finished doing a partial rewrite from VB6 and ASP classic.  I know that team was trying to use Test Driven Development in their daily coding, but I think they hit the exact same problems with testability that I experienced on my first TDD project by letting the business code get tightly coupled to both the database and the user interface.  The application took in submitted invoices, performed copious amounts of business rules validation, and either succeeded and sent the invoice on to the downstream systems or reported the list of validation failure messages.  One of the immediate problems was the main class that built the user response was a long set of procedural code that intermingled the html creation and the business logic like this (it had been a straight port from VB6 & ASP Classic to C# by people who had never done .Net, so cut them some slack here):



    public class InvoiceScreenCreator


    {


        private readonly Invoice _invoice;


 


        public InvoiceScreenCreator(Invoice invoice)


        {


            _invoice = invoice;


        }


 


        public string CreateHTML()


        {


            string html = “<h3>” + _invoice.InvoiceId + “</h3>”;


 


            // Run the complex invoice validation logic


            if (someFairlyComplexBusinessLogicDeterminationOnInvoice(_invoice))


            {


                html = html + “<p>The invoice succeeded!</p>”;


            }


            else


            {


                foreach (string errorMessage in _invoice.ErrorMessages)


                {


                    writeInvoiceError(errorMessage);


                }


            }


 


            return html;


        }


    }


 


There’s plenty wrong with that general approach, but the killer to me was that there was quite a bit of complexity in the business rules and the only direct way to measure the business logic outcome was to scrape through the html created by the InvoiceScreenCreator class.  You’re effectively testing business logic through side effects, and you’re not able to write isolated unit tests for either the user interface or the business logic.  If something is wrong in the tests you have both UI and business logic to debug.  Testing business logic through the user interface can easily detract from the understandability of the test as user interface verbiage is intermixed with the core business logic.  It works the other way around too.  Creating the user display was nontrivial and it would have been beneficial to test the user display in isolation from the invoice validation logic.


Oh, and needless to say for anyone who wrote or maintained ASP Classic back in the day, intertwining business logic in the middle of concatenating html together almost completely repels any effort to understand the business logic.


Again, the solution is to separate the responsibilities for the user display and the business logic into different classes or subsystems.  The business logic runs against the submitted invoice, makes the validation determinations, and builds some sort of object that reports all of the validation information.  The user interface code could just take the completed validation report and create the display.  Make that separation and now you can test either piece in relative isolation.  In reality, the user interface and business logic are likely to change at different times.  It’s worth your while to be able to change one without either affecting the other or having to understand the other piece while you work.


Of course another lesson is that you pretty well get what you deserve when you do a straight port of dubious quality legacy code.


Isolate the Churn


Some elements of your codebase are going to change much more frequently than the rest of the system.  Some modules may need a very large number of testing permutations to fully cover all of the input possibilities.  In either case it’s very advantageous to isolate these areas of your code from everything else.  It’s smart to optimize the mechanics of testing for these modules by having a quick path to create test inputs and measure the outcome without involving any other piece of code..  Case in point, working with financial companies now I’m frequently bumping into systems that perform analysis on trade and market data to determine pricing or trading strategies.  The analysis code, especially if it’s a trading strategy, is going to go through a lot of churn as the algorithms change while the backend storage for the trade and market information remains relatively unchanged.  The analysis code will inevitably require a large number of test cases to cover all the permutations of business conditions.  I think it’s probably fair to say that the bottleneck in delivering the trade and market analysis is most likely the testing time and overhead.  Making that code easier to test by limiting its tail of dependencies should optimize your time to ship that code.


Of course, if you treat the trade and market analysis code like the rock star it is, then it’s already going to be isolated with a minimal tail of dependencies and you’re good to go.


Whenever you can, keep the Database on the Sidelines


The database cares very deeply about the complete integrity of your data — heck, that’s a large part of a relational database’s very purpose in life.  That very data integrity goodness is a lot of what makes a database a PITA when you’re constructing automated tests.  To write an effective automated test you need to establish a combination of known inputs and expected outputs.  If you’re testing against the database that means loading the database with data.  Sometimes that isn’t that big of a deal, but with any level of database complexity that quickly turns into a pain because:



  • Database access in tests will make the tests run slower than tests that run completely within an AppDomain.  And yes, automated test execution time is a big deal, worthy of serious design consideration.  Enough so that I’d call it a justification by itself for decoupling business logic from the database
  • You have to make sure that dependent data is loaded first for referential integrity.  You can load a set of known reference data to help with this, but tests are generally much more comprehensible if you can see the test inputs and outputs in the same screen.
  • You have to supply some data to the database for non-nullable fields that isn’t relevant to the test.  It’s extra mechanical work and it’s noise code in the tests.

Think about this very realistic case.  You’re building a screen to edit an existing invoice.  If the invoice has already been paid you want the screen to disable editing.  When I’m testing my presenter/controller for this scenario the only piece of information I need to set on an Invoice object is some sort of IsPaid flag.  Think about the test setup overhead of just creating an Invoice object and setting a single property versus the effort it typically takes to create an invoice in the database tables.


You may not interpret anything in this section to mean you shouldn’t use referential integrity checks in the database.  Leaving off referential integrity checks in the database is a lot like opening the kitchen door in the summer and letting all the fly’s in.  It’s just asking for really weird bugs in the system.  Moreover, I’ve often found that a lack of referential integrity makes it harder to write integration tests.  Let the database do its thing, and the business logic do its thing without each other getting in each other’s way.


Conclusion 


Sometimes the fastest way to get code working is to write code in more, smaller pieces.  It may seem like more complexity, but I’ll argue vociferously that it’s better to minimize the complexity of any one part of the system rather than minimize the number of pieces in the system.  Software design and construction is all about divide and conquer.  Big classes and methods spanning multiple responsibilities and multiple concerns will never, ever be as efficient to ship as a well factored system.  Remember when you’re deciding how to structure your code that code cannot ship until testing is complete.  I


 mostly presented the “Keep Your Tail Short” law in terms of testability, but it also goes a long way towards creating more opportunities for reuse and extension of the existing system.  That’s crucial for doing continuous design.  Putting off technical complexity and delaying architectural commitment works best when the code is malleable, and that’s enabled by minimizing the dependency tails between the classes in the code.


 


Wait, there’s even more!  I’ll write a follow up soon with some more concrete examples of “Keep Your Tail Short” in regards to enterprise development.  I’m using this post, and several others over the next couple weeks, to flesh out my talking points for my “Laws of Agile Design” talk at DevTeach.  Feedback and criticism on this post would be very much appreciated.


 

About Jeremy Miller

Jeremy is the Chief Software Architect at Dovetail Software, the coolest ISV in Austin. Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy is the author of the open source StructureMap tool for Dependency Injection with .Net, StoryTeller for supercharged acceptance testing in .Net, and one of the principal developers behind FubuMVC. Jeremy's thoughts on all things software can be found at The Shade Tree Developer at http://codebetter.com/jeremymiller.
This entry was posted in Maintainability, Test Driven Development. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://zhaorui.cnblogs.com zhaorui

    Glad to read these article.

    Is someday coming ?

    Where are the 5-10 and the Zeroth law?

  • http://blog.softwarearchitecture.com Brian Sondergaard

    But where would we be without all those clouds? ;-)

    Brian
    http://blog.softwarearchitecture.com

  • http://codebetter.com/blogs/jeremy.miller Jeremy D. Miller

    Brian,

    *Shudder,* I know Grady Booch is brilliant and he has a lot to say, but I think jis writing style excruciatingly boring. I’d go so far in saying that the success of XP and Agile in general is in no small part due to the fact that Kent Beck and Martin Fowler are vastly better authers.

    Jeremy

  • http://blog.softwarearchitecture.com Brian Sondergaard

    You reference a couple of my favorite books here. Out of curiosity, have you had a look at the recently released 3rd edition of Object-Oriented Analysis and Design with Applications?

  • http://colinjack.blogspot.com/ Colin Jack

    Phew! I’m glad I wasn’t the only one who found Object Design hard going, I’ll definitely give it another try.

  • http://codebetter.com/blogs/jeremy.miller Jeremy D. Miller

    Colin,

    Yeah, I thought “Object Design” was horribly boring too and I barely managed to slog through it. Then I started to find myself using RDD left and right, so you can’t call the book a total loss.

    Jeremy

  • http://colinjack.blogspot.com/ Colin Jack

    Good stuff but I must admit the whole Aerosmith analogy did lose me a bit, having said that after a re-read it all made perfect sense.

    It’s also interesting to see you referencing “Object Design”. I found it to be a bit of a boreathon but I’ll have to re-read it, this time with lots of caffeine to help me through it.

    Out of interest have you thought about writing a book about RDD, TDD and so on?

  • http://community.ative.dk/blogs/ Martin Jul

    Regarding referential integrity as a roadblock here is a trick we use to keep the test code short and readable: we use a set of factories for generating semantically complete instances for testing (we also use them for stubs). The factories use each other to also keep the factories simple.

    The point is especially to have relevant parts of the reference tables in the database duplicated in the factories so they match. This way, we can write “Country denmark = CountryFactory.CreateDenmark(); in the test and be sure that the ISO country code etc is populated with the same values as in the database. This way, if-and-when the reference data changes only the factories need to be updated.

    And when we test that saving Users work we don’t have to worry about the country codes and other referential integrity stuff – we just do “user.Country = denmark” with knowledge that we are violating any DB constraints on countries when we save.

    Also, the tests are much clearer since I tend to write them so they tell a story, eg. “UserFactory.CreateAliceTheSupervisor()” tells much more than populating a User object in the test fixture with all kinds of stuff that clutters the intention of the test.

    I find that working this way – and relying heavily on inversion-of-control as indicated by Jeremy’s post – testing, including integration testing, is very light touch.

    And that is the key to making TDD work.

  • Carlton

    I’ll go out on a limb…I find referential integrity a real roadblock to writing integration tests. You need to populate way more data than is needed and only adds to the complexity of the tests, makes them harder to understand and more brittle.