TDD/BDD and the lean startup

A post at Hacker News caused a little bit of a storm on Twitter, by questioning if a lean startup needed to use Test First or if it was an obstacle to delivery.

Build the right thing, or avoid Rework

The poster complains that test first approaches don’t match with market driven evaluation of features:

Most of the code in a lean startup are hypotheses that will be tested by the market, and possibly thrown out (even harder to rewrite test cases with slightly different requirements over writing from the beginning).

First let’s look at a big reduction on the cost of delivery of software – building the right thing in the first place. Those people who come from a background in organizations with little or no process, often confuse Agile’s attack on process over software as support for having no process. It is not. In many ways classical agile is best understood in the context of what went before, heavyweight process, often waterfall such as SSADM. In these cases up to a third of the project’s time was spent in the design phase, before code was written. We don’t want the waste associated with that, but we do want just enough process and we want it at the last responsible moment.

Test-First approaches focus on ensuring that we think about the acceptance criteria for the software we are about to build. The big danger we are trying to avoid is re-work. It is estimated that 50% of a project’s overrun costs come down to rework. Creating a testable specification, through a Given-When-Then scenario or use case helps us to turn a woolly and vague requirement, such as we need to show how late a tube train is running, to something more concrete. We are forced to consider the edge cases, and the failure conditions. It is cheaper to consider these before the code is written, than through writing code and adjusting it afterwards. There will always be some rework, particularly around UX, but we can remove the obvious communication barriers by having a structured conversation around the requirements.

BDD and ATDD emphasize avoiding rework by ensuring that you have the acceptance criteria for a story defined before work begins. The process of defining those criteria, often through a story workshop is much of the value, the creation of  a shared understanding of how the product should work. Usually that workshop occurs in the preceding iteration, and takes about 10-15% of the time in that iteration. Whilst some folks consider it non-agile to work on the next iteration’s requirements, this really is the last responsible moment, because you may raise questions when discussing the acceptance detail for a story that take time to answer, and you don’t want to stall the line to answer them in an iteration.

So its important to understand that in this context Test-First is about reducing waste in the development cycle, not increasing it.

From Extreme Programming:

Creating a unit test helps a developer to really consider what needs to be done. Requirements are nailed down firmly by tests. There can be no misunderstanding a specification written in the form of executable code.

We still need to address the question of what will work in the marketplace; what if your feature is supposed to “test the water”. Usually, the concern here is whether testing is wasted effort if you decide to throw the feature away. Some of that depends on what we think the cost of working test first is – and I’ll get to that next, but an alternative to waste reduction here is to look at incremental development approaches. Figure out what the minimum marketable feature set is, and release that to gain feedback.

With subsequent iterations increment the feature set, if feedback suggests that it is successful. However, at the heart of this is that a good product depends on a good vision, cleanly and simply executed.

Think about the great products you know, they will tend to focus on doing one thing and doing it well, not being a jack of all trades. The scatter-gun approach tends to indicate a lack of product vision to me, not an effective strategy for uncovering what your users want. Build something simple, and wait for feedback on additional capabilities. But they have to buy your vision.

There are many emerging ideas on how to reduce the cost of discovering the market demand for new features, without building them. The Pretotyping guys are gaining some publicity, but there are also the Design Fiction crowd. Both talk about the step before even prototyping when you try to determine the viability of an idea. Recently at Huddle we have been practicing Documentation Driven Development in an effort to gain feedback on proposed API designs, before we build them.

Feedback, and the cost of TDD

The poster on Hacker News complains of the time taken to write tests:

There always seems to be about 3-5x as much code for a feature’s respective set of tests as the actual feature (just takes a lot of damn time to write that much code).

It’s crazy talk to suggest that developers ship code without testing it in some way. Before TDD the best practices that I used to follow talked about spinning up the debugger and tracing every line of code that you had written to make sure it behaved as you expected. Spinning up the debugger and tracing is expensive, far more expensive than unit testing where you hope not to have to start a debugging session to implement a test (it’s usually a sign – to use Kent Beck’s stick shift analogy – that you are in the wrong gear in TDD and need to shift down by writing lower level tests).

In addition, getting this code to the point you can debug may be hard, requiring a number of steps to put the system in the right step. Unit tests are isolated from these concerns, which makes their context easier to establish, you have no such luck with manual tracing.

This problem only gets worse if you have to repeatedly spin up the debugger to trace through the code as you enhance it. Most folks break a problem down into parts to solve it, adding new capabilities. They are naturally incremental. So this means you need to perform this test run through repeatedly.

In addition, unit tests are not particularly new and have been a recommended practice for year. Many developers wrote self-testing code, that came with a simple harness to establish context and exercise the code, asserting on failure, before test first capitalized on the approach, but switched to writing tests first. Anyone who cares about their code being correct ought to be unit testing, and if you write the test last you are in the unenviable position of paying the cost to trace the code while you develop as well as writing a unit test after.

So usually, any halfway responsible developer is doing more work to prove his code is correct without test first rather than with.

The problem is that we tend to measure time to write the code, which seems slower with up-front tests, instead of time to fashion working code, which is often much faster with tests. Because of this we feel that test first takes longer. But we need to look at all the effort we expend to deliver the story end-to-end, which in this case tends to make dropping test first look like a local optima.

In addition, even if there was zero gain for test first, or a little worse, the side-affect of gaining a test suite that you can re-run exceeds any likely gains from not doing test first. So the barrier for tests to prove their value in cost to develop a feature is low.

The cost of ownership of tests can cause some concerns. The poster at Hacker News complains that:

When we pivot or iterate, it seems we always spend a lot of time finding and deleting test cases related to old functionality (disincentive on iterating/pivoting, software is less flexible).

One problem with ‘expensive’ tests is often that implementers have not looked into best practice. Test code has its own rules, and is not a second class citizen. You need to both keep up to date with best practice to keep your test code in good shape. Too many teams fail to understand the importance of mastering the discipline of unit testing, in order to make it cost effective. As a starter, I would look at both xUnit Test Patterns, the RSpec book and Growing Object Oriented Software Guided ByTests if you need more advice on current practice.

I may write a longer post on this at some point, but my approach is to use a Hexagonal, or ports and adapters, architecture with unit tests focused on exercising scenarios at the port, and diving deeper if you need to shift gears. Briefly though, one reason people find tests expensive to own is an over-reliance on Testcase class per class instead of Testcase per class Feature or Testcase per Scenario. Remember that you care about putting your public API under test, which is usually the port level in a Hexagonal architecture, perhaps a number of top-level domain classes exercised in those ports. You may want to wrap internals when you need to shift down a gear, in Kent Beck’s terminology, because you are unsure of the road  ahead, but don’t succumb to the temptation to believe Test First is about testing every class in your solution. This will only become an obstacle to refactoring, because your changes will break tests, as you change the internal structure of your application, which makes testing itself feel as though it makes the application more brittle.

Often too many tests breaking is a sign that you have not encapsulated your implementation enough, exposing it all publicly, and you are over-specifying that implementation through tests.

Acceptance and Unit Tests

Let’s agree definitions. Acceptance tests confirm that a story is done-done whereas a unit test confirms that a part is correct. Or to put it another way unit tests confirm that we have built the software right, acceptance tests that we built the right software.

A lot of the above discussion focuses on unit tests, but when we get to acceptance tests the situation can get a little more complex. Acceptance testing may be automated, but it may also be manual and the trade off on when to automate is a more difficult one to make than when using unit testing.

Now the clear win tends to come when you have well-defined business rules that you want to automate. Provided you have a hexagonal, or ports and adapters architecture, you can usually test these at the port  – i.e. sub-cutaneously without touching the UX. So they tend to be similar to unit tests written at the same level, they just go end-to-end.

Automated tests tend to be a little more complex when they go end-to-end, because you have to worry about setting up data in the required state for the test to run. If you find your tests taking too long to write, you may need to break out common functionality such as setting up context in a TestDataBuilder. (The TestDataBuilder pattern is also useful in unit tests for setting up complex collaborating objects).

There is a misunderstanding that acceptance tests are about tooling, using Fitnesse, Cucumber or the like. It’s about agreeing acceptance criteria up front, the clarity of the story that is provided through that preventing rework, and the automation of the confirmation so that what is done stays done. The appropriate tooling depends on how you need to communicate the resulting tests – with a ports-and-adapters architecture you just need a need a testrunner to execute code, and that testrunner may be written in natural language or code depending on the audience.

Because much of the value to acceptance tests comes from defining done-done it would be possible to suggest that automating these tests gives less value back than the process of agreeing the test plan. However that suite of automated tests gives you quick regression testing support, and that supports your ability to re-write instead of refactor. Unit tests support preserving behaviour whilst refactoring, but this does not include changing the public interface; for that i.e. preserving the business rules while making large scale changes, you need some acceptance tests.

It is common for teams that do hit the bar of effective unit testing to fail at the bar of acceptance testing.

My previous experience working both with and without acceptance tests is that without acceptance tests we often find bugs creeping in the software that our unit test suite does not catch. This is especially true if our acceptance tests are our only integration tests. If we do no enforce the strict policy that existing acceptance tests must be green before check in, we often get builds that introduce regression issues, that may go uncaught for some period of time. It’s this long feedback loop, between an issue being created by a check in and then discovered that causes us significant issues – because we do not know what change led to the regression issue.

So by avoiding the cost of acceptance tests we risk the costs of the work to fix those regression defects and because the point between the defect being discovered and a developer making a check in long, they may cost more to fix than writing the acceptance tests would have done.

In environments where I have enforced acceptance tests being green before check-in the quality of software improved and when regression issues did emerge it was clear to the developer that thier changes had caused the issue, leading to faster resolution (and preserving the current test environments).

Of course this may depend on how long it takes to regression test your software. That situation seems to many even more marginal when we think about automating UI tests for acceptance. Often the cost of keeping pace with changes to the UI within tests can seem to be fruitless. Particularly in early stages of a product the cost of creating and maintaining UX tests can seem to exceed their value. You can end up in a scenario where the cost of a manual regression test run is less than the cost of maintaining an automated test suite, and finds more defects.

This is especially true if you have a thin UI adapter layer over your ports, with acceptance tests against the ports catching your end-to-end issues. You already have plenty of behaviour preserving functionality around your business rules, so now the UI tests should really be only picking up UI defects (of course if you have a Magic Pushbutton architecture all bets are off as you will can’t easily write tests without exercising the UI)

The problem becomes that this equation holds true, until it doesn’t. At a certain point the manual regression test costs become an obstacle to regular releases – simply put it takes you too long to confirm the software, and the cost of releasing rises to the point that you often defer it, because you don’t have sufficient value in a release compared to the regression testing effort.

At that point you will wish you had been building those automated UI tests all along.

I think there is a point where you need to begin UX automation, but it is hard to call the exact point when you begin doing that. It may depend on your ability to create adequate coverage through acceptance tests written against your ports, it may depend on the complexity of the UI. Up to now I think I have always started too early, regretted it and abandoned the effort, then resumed too later because of that. I need to learn more here before I can make a call on getting this right. Again I suspect that understanding best practice around this type of testing might help, and I’d be happy for anyone sharing links in the comments below.

TDD/BDD and the Lean Startup

Overall I think that there are two dangers to the simplistic position given by the original poster. The first is to mistake reducing the time taken to get to the point where you can run code, as reducing the time taken to get to the point where you can release code. Once you appreciate the two are not the same, and look at overall costs and benefits, the value of TDD/BDD becomes more compelling. The second is to mistake lack of understanding of best practice in TDD with a failure in the process itself. We have come a long way in the last few years in learning how to reduce the costs of writing and owning tests, and they should be judged in that context.

About Ian Cooper

Ian Cooper has over 18 years of experience delivering Microsoft platform solutions in government, healthcare, and finance. During that time he has worked for the DTi, Reuters, Sungard, Misys and Beazley delivering everything from bespoke enterpise solutions to 'shrink-wrapped' products to thousands of customers. Ian is a passionate exponent of the benefits of OO and Agile. He is test-infected and contagious. When he is not writing C# code he is also the and founder of the London .NET user group. http://www.dnug.org.uk
This entry was posted in Agile, ATDD, BDD, STDD, TDD, Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Pingback: TDD, So Far So Good – The Omega Project

  • http://www.modularhomesnetwork.com/ Manufactured Homes

    Thanks that you offer that tips but i can use it because i dont know how. Thanks anyway

  • Lee Brandt

    “The problem becomes that this equation holds true, until it doesn’t. ”

    Best description ever. It is so true. I find myself trying to explain this to people who are resistant to TDD/ATDD all the time.

    Thanks.

  • http://www.infovark.com Dean Thrasher

    Great post, Ian. Like you, I’ve found much of the pain and cost of unit testing comes from learning how to do it properly. It took us several months of stumbling through unit testing anti-patterns before we discovered some of the tricks. Some of these involved restructuring the production codebase to support testing, and some involved refactoring the test code itself to improve isolation.

    Once we’d gotten past this learning curve, we found testing to be invaluable. Now I can’t imagine working on production code without having a suite of tests. It’s become a natural part of our development process.

    I suppose if you’re working on an early-stage consumer web application, where the cost to deploy fixes are low, and the costs of alienating customers is low, you might be able to argue that TDD is a drag on productivity. But that’s a narrow, special case.

    Even then, as soon as the visibility of the service, and its reliability and its maintainability become important, you’ll need automated tests.

  • Bored

    Good God, too long.

  • Paul

    Ian,

    A really key point in your post is this – “Briefly though, one reason people find tests expensive to own is an over-reliance on Testcase class per class instead of Testcase per class Feature or Testcase per Scenario”.

    And that is where I am at now. I have seen the folly of my ways, which involved testing every class and every method, in the spirit of code coverage. What I ended up with was a loss in productivity, some brittle tests, many useless tests, and frustration. As I have moved to a more feature testing approach, I find that I dive down for granular tests when I need it, not by default. Testing seems to have more immediate value when I follow this approach.

  • guest

    Great post, thanks.