TDD and Hard to Test Areas, Part1

 

TDD and Hard-To-Test Areas

I wanted to talk about the issues that people get when they begin working with TDD, the same issues that tend to make them abandon TDD after an initial experiment. Those are the ‘hard-to-test’ areas, the things production code needs to do, that those presentations and introductory books just don’t seem to explain well. In this post we will start with a quick review of TDD, and then get into why people fail when they start trying to use it. Next time around we will look more closely at solutions.

Review

Clean Code Now

TDD is an approach to development in which we write our
tests, before writing production code. The benefit of this are:

  • Tests
    help us improve quality: Tests give us prompt feedback. We receive
    immediate confirmation that our code behaves as expected. The cheapest
    point to fix a defect is at the point you create it.
  • Tests
    help us spend less time in the debugger. When something breaks our tests
    are often granular enough to show us what has gone wrong, without
    requiring us to debug. If they don’t then we probably don’t have granular
    enough or well-authored tests. Debugging eats time, so anything that helps
    us stay out of the debugger helps us deliver for a lower cost.
  • Tests
    help us produce clean code: We don’t add speculative functionality, only
    code for which we have a test.
  • Tests
    help us deliver good design: Our test proves not just our code, but our
    design, because the act of writing a test forces us to make decisions
    about the design of the SUT.
  • Tests
    help us keep a good design: Our tests allow us to refactor – changing the
    implementation to remove code smells, while confirming that our code
    continues to work. This allows us to do incremental re-architecture,
    keeping the design lean and fit while we add new features.
  • Tests
    help to document our system: If you want to know how the SUT should behave
    examples are an effective means of communicating that information. Tests
    provide those examples.

Automated tests lower the cost of performing these tests. We
pay a cost once, but because we can then re-run our tests at a marginal cost
they help us keep those benefits throughout the system lifetime. Automated
tests are ‘the gift that keeps on giving’. Software spends more of its life in
maintenance than in development, so reducing the cost of maintenance lowers the
cost of software.

The Steps

The steps in TDD are often described as Red-Green-Refactor

 Red: Write a failing test (there are no tests-for-tests, so
this checks your test for you)

Green: Make it pass

Refactor: Clear up any smells in the implementation
resulting from the code we just added.

Where to find out more

Kent Beck’s book Test-Driven Development, By Example remains
the classic text for learning the basics of TDD.

Quick Definitions

System Under Test
(SUT)
– Whatever we are testing, this may differ depending on the level of
the test. For a unit test this might be a class or method on that class. For
acceptance tests this may be a slice of the application.

Depended Upon
Component (DOC)
– Something that the SUT depends on, a class or
component.

What do we mean by hard-to-test?

The Wall

When we start using TDD we rapidly hit a wall of hard-to-test
areas. Perhaps the simple red-green-refactor cycle gets begins to get bogged
down when we start working with infrastructure layer code that talks to the Db
or an external web service. Perhaps we don’t know hot to drive our UI through a
xUnit framework. Or perhaps we had a legacy codebase, and putting even the
smallest part under test quickly became a marathon instead of short sprints.

TDD newbies often find that it all gets a bit sticky, and
faced with schedule pressure, drop TDD. Having dropped it they lose faith in
its ability to deliver for them and still meet schedule pressure. We are all
the same, under pressure we fall back on what we know; hit a few difficulties
in TDD and developers stop writing tests.

The common thread among hard-to-test areas is that they
break the rhythm of development from our rapid test and check-in cycle, and are
expensive and time-consuming to write. The tests are often fragile, failing
erratically and difficult to maintain.

The Database

  • Slow Tests: Database
    tests run slowly, up to 50 times more slowly than normal tests. This
    breaks the cycle of TDD. Developers tend to skip running all the tests
    because it takes too long.
  • Shared Fixture Bugs: A
    database is an example of a shared
    fixture
    . A shared fixture shares state across multiple tests. The
    danger here is that Test A and Test B pass in isolation, but running Test
    A after test B changes the value of that fixture so that the other test
    fails unexpectedly. These kinds of bugs are expensive to track down and
    fix. You end up with a binary search pattern to try and resolve shared
    fixture issues: trying out combinations of tests to see what combinations
    fail. Because that is so time consuming developers tend to ignore or
    delete these tests when they fail.
  • Obscure Tests: To
    avoid shared fixture issues people sometimes try to start with a clean
    database. In the setup for their test they populate the Db with any values
    they need, and in the teardown clean them out. These tests become obscure,
    because the setup and teardown code adds a lot of noise, distracting from
    what is really under test. This makes tests hard to read as they are less
    granular, and thereby harder to find the cause of failure in.  The Db setup and teardown code is
    another point of failure. Remember that the only test we have for out
    tests themselves is to write a failing test. Once you get too much complexity
    in your test itself it can become difficult to know if your test is
    functioning correctly.  It also
    makes them harder to write. You spend a lot of time writing setup and tear
    down code which shifts your focus away from the code you are trying to
    bring under test, breaking the TDD rhythm.
  • Conditional Logic:
    Database tests also tend to end up with conditional logic – we are not
    really sure what we are going to get back, so we have to insert a
    conditional check to see what we got back. Our tests should not contain
    conditional logic. We should be able to predict the behavior of our tests.
    Among other issues, we test our tests by making them fail first.
    Introducing too many paths creates the risk that the errors are in our
    test not in the SUT.

The UI

  • Not xUnit
    strength: xUnit tools are great at driving an API, but are less good at
    driving a UI. This tends to be because a UI runs in a framework that the
    test runner would need to emulate, or interact with. Testing a WinForms
    app needs the message pump, testing a Web Forms app needs the ASP.NET
    pipeline. Solutions like NUnitAsp
    have proved less effective at testing UIs than scripting tools like Watir or Selenium, often lacking support for
    features like JavaScript on pages.
  • Slow
    Tests: UI tests tend to be slow tests because they are end-to-end,
    touching the entire stack down to the Db.
  • Fragile
    Tests: UI tests tend to be fragile, because they often fall foul of
    attempts to refactor our UI. So changing the order and position of fields
    on the UI, or the type of control used will often break our tests. This
    makes UI tests expensive to maintain.

The Usual Suspects

We can identify a list of the usual suspects, who cause issues for
successful unit testing.

  • Communicating
    Across a Network
  • Touching
    the File System
  • Requires
    the Environment to be configured
  • An
    out-of-process call (includes talking to Db)
  • UI

Where to find out more

XUnit Patterns: Gerard Meszaros’ site and book are essential reading if you want to understand the patterns involved in test-driven development

Working with Legacy Code: Michael Feathers’ book is the definitive guide to test-first development in scenarios where you are working with legacy code that has no tests.

Next time around we will look at how we solve these issues.

 

About Ian Cooper

Ian Cooper has over 18 years of experience delivering Microsoft platform solutions in government, healthcare, and finance. During that time he has worked for the DTi, Reuters, Sungard, Misys and Beazley delivering everything from bespoke enterpise solutions to 'shrink-wrapped' products to thousands of customers. Ian is a passionate exponent of the benefits of OO and Agile. He is test-infected and contagious. When he is not writing C# code he is also the and founder of the London .NET user group. http://www.dnug.org.uk
This entry was posted in Agile, TDD. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://vodialstoyats.weebly.com/ OrenBarron

    I’ve been composing on the internet for over 3 years now. These days, I am into writing technical articles and blogposts for the web.

  • http://goarticles.com/article/Understand-What-Is-Biztalk-Development-and-Its-Advantages-for-Business-and-Other-Applications/7073475/ Jeanette Dixon

    when you’re developing Web Parts for WSS or for Office, BizTalk, Dynamics, Third Party Products, etc.

  • http://colinjack.blogspot.com Colin Jack

    @Tony
    Sorry just re-read your point, your not suggesting Ruby at all :)

  • http://colinjack.blogspot.com Colin Jack

    @Tony
    What exactly are you suggesting, Ruby?

  • http://tmorris.net/ Tony Morris

    Just control your side-effects already. Your discussion is only relevant when using languages for people who are afraid of abstraction and high-level programming (and so perpetuate – almost by mandate – their poor technique, which is the core problem, not the language). Not trying to be mean or anything, but I strongly recommend you stop explaining away the symptoms and providing snake oil solutions and solve the *actual problem*.

  • Jeremy Gray

    @Niki – I think the situation you just described is exactly _why_ you want test automation, whether in the form of a strict unit test or an integration or acceptance test. If you have a known set of inputs, each with their known desired result, automated execution of your evolving algorithm against that large set of input is exactly what is going to tell you whether or not any one or series of algorithm changes are heading in the right direction. True, you may run your more frequent during-active-development test cycle using a smaller number of inputs to speed things up, but who cares how long the formal run takes: you want to run as many known inputs through it as possible, and you want to do so on an automated basis.

  • http://www.genericerror.com/Blog/ Barry Dahlberg

    I had a rant about exactly these issues from a web perspective a few days ago:

    http://www.genericerror.com/Blog/2008/07/unit-tests-vs-productivity-right-vs.html

    Seperation of concerns is fine but when the parts you can’t test are bigger than the parts you can there is a problem.

  • Ian Cooper

    @Niki

    The key to agile approaches is short feedback loops. design a bit, test a bit, design a bit, test a bit. Putting the test first comes from the idea of eliminating waste by doing some design first, based on hard user requirements driven from stories by tests.

    But if you can’t do any design first, then you can’t. In that case the important thing would be to keep your feedback loop short, which it sounds as though you are doing anyway. So I think this is an agile approach to algorithm discovery, even if not a TDD one.

    Make sense?

  • http://www.focuspocus.org ALB

    Mock frameworks such as EasyMock are a good starting point to a general solution

  • Niki

    @Ian Cooper:

    My problem is that the usual development process in this area is completely different: You usually can’t derive an optimal image processing algorithm on a whiteboard using only maths and then implement it. Also, you don’t have a “domain expert” that can tell you what the algorithm should do just by thinking hard. (I’m just guessing, but I can imagine this might be similar for domains like natural language processing or speech recognition.) The more common approach is that you start with a simple algorithm that does more or less “what you think your eyes are doing”, then see where it doesn’t work so well, and see if you can improve it e.g. by using pre-filters, or by modifying parts of the algorithm, or by trying a completely different approach. Once you’re at a point where the recognition results of your algorithm are good enough, you’re done. There is no “implementation” step after that, you already had to do the whole implementation to see _if_ the algorithm is good enough. You had to test it on thousands of images, or maybe even test it in producation before you can be really sure about that.

    Imagine for example that you start by applying a contrast filter, then searching for the brightest pixel. Of course you could write tests for this before you implement it, but those tests would probably be harder to write and more error-prone than the actual code (you’re just calling two library functions FilterContrast and FindBrightestPixel or something, but for the tests you would actually have to calculate the correct results by some different means). And as soon as you see that this first algorithm isn’t good enough, you’ll have to throw that test away. So, yeah, you could probably do it that way. It just doesn’t seem to be very useful.

  • Ian Cooper

    @Niki

    I’m not sure I know enough about image processing to answer this :-)

    But I would assume that the image is set of data and our algorithm searches it for patterns that match, and presumably tries to remove false positives by looking at the nearby data at the same time. Our unit test is there to confirm that we have coded the algorithm correctly, so given this test data that should produce a hit, based upon the heuristics used in this algorithm, have we implemented it correctly.

    Now I am assuming you are not trying to discover an algorithm for this (which seems to be more of a maths problem). Could you discover an algorithm using TDD. Sure. Would it be any good. Only acceptance testing would tell you over a large enough data set. Would it be a good way to uncover such an algorithm. I don’t know, because I don’t know enough about how researchers in that field usually uncover their algorithms. But could you. For sure. Should you. I suspect there might be a more effective technique. But that’s why agile has the domain expert. But would I code up the implementation via TDD, For sure.

  • Niki

    @Ian Cooper:

    Interesting thought. So what would a unit-test for an image-processing example like that look like? At some point in a software like that, you’d have a function that takes an image and says “yes” or “no” (or “tank” or “no tank”). A unit-test for this function would have to call it with some input and assert some kind of output. But what would that assertion be? What this function does “under the hood” is defined nowhere, and will probably change every time the algorithm is improved, so testing that doesn’t really help a lot. (At least that’s the way I see it: The point of a unit-test is that it tells you if your code is still working after a change. So a unit-test that will fail after almost every change is not a great help.)

    Another problem with this kind of software is that you don’t really know the optimal algorithm from the start – you have to test different approaches and compare them statistically to see which solves your problem best. That’s quite contrary to the TDD-approach, where have to know what results you expect from your code before you write it.

  • http://blog.quantumbitdesigns.com Kevin Kerr

    I’m glad you touched upon the ‘shared fixture’ concept. I had never heard of that, yet it is purposely designed in to my test GUI/framework.

  • Ian Cooper

    @Niki

    Be aware though that there is a difference semantically between a unit test – which tests a small unit of code – and an acceptance test – which confirms the software meets the user’s requirements. While I expect you could unit test your algorithm was correctly implemented, you would still want some sort of data-driven acceptance tests to confirm the quality of that algorithm.

    Which agrees with what you are saying but introduces differentiating terminology.

  • Niki

    Sorry for the double-posting, I got a server error message when I posted the first time. Can someone maybe delete this?

  • Niki

    I think we have to accept that some things can’t be unit-tested. Imagine for example you had to find tanks in hires-sattelite images. A requirement would be that the detection rate must be >90%, the the fals detection rate <1%. How do you test this requirement? To get any statistically meaningful results, you’d have to test at least hundreds, better thousands of sample images, so your unit-tests would take literally hours to run. Even worse, your algorithm would probably need some parameters like thresholds, regions of interest, that the end-user has to enter, depending on what her image quality is like and what she is looking for. So every time you improve the algorithm (e.g. switching from an absolute threshold to an adaptive threshold), you’d have to find the optimal parameter set for your test-images again and change your unit-tests.
    Of course you could do some basic smoke tests (e.g. does your algorithm crash for random images/random parameters?), and you might test the functionality of parts of the algorithm (e.g. a thresholding function), but I don’t think you can automate the tests for the actual functionality the user is interested in.

  • Niki

    I think we have to accept that some things can’t be unit-tested. Imagine for example you had to find tanks in hires-sattelite images. A requirement would be that the detection rate must be >90%, the the fals detection rate <1%. How do you test this requirement? To get any statistically meaningful results, you’d have to test at least hundreds, better thousands of sample images, so your unit-tests would take literally hours to run. Even worse, your algorithm would probably need some parameters like thresholds, regions of interest, that the end-user has to enter, depending on what her image quality is like and what she is looking for. So every time you improve the algorithm (e.g. switching from an absolute threshold to an adaptive threshold), you’d have to find the optimal parameter set for your test-images again and change your unit-tests.
    Of course you could do some basic smoke tests (e.g. does your algorithm crash for random images/random parameters?), and you might test the functionality of parts of the algorithm (e.g. a thresholding function), but I don’t think you can automate the tests for the actual functionality the user is interested in.

  • Ian Cooper

    @Rolf, @Martin

    You’re both hitting the biggest problem with frameworks, in that they are too often intrusive into your application. They cloud your domain and often requires you to spin up the framework itself to test your domain model.

    I’m not going to give you an answer that does not involve abstracting yourself from the pain, but be aware that the reason why the alt.net community pushes back against tool sets like the Entity Framework is exactly this problem of lack of a good separation of concerns.

  • http://rolfeleveld.spaces.live.com Rolf Eleveld

    Martin,
    I see exactly where you’re going with this, and I seem on the wagon that when you’re developing Web Parts for WSS or for Office, BizTalk, Dynamics, Third Party Products, etc. You end up with wrapping their code in a layer for the product you’re building just so you can actually test your code. Effectively adding one more layer of abstraction that adds an extra piece of effort and possible faults. I have not found a structural way to effectively test software that uses these server software and is hosted there-in. If you’ve thought up a clean way to prove that your code does work as designed and it’s not an idiosyncrasy of the hosted software you could make me a happier man!
    Regards,
    Rolf

  • Ian Cooper

    @Martin

    Once we step outside the domain and start to deal with infrastructure code, then yes things start to get harder. That is why separation of concerns is so valuable, because we can reduce the pain areas around the domain.

  • Martin Laufer

    Seems to me, as all non-pure fucntional areas are hard to test?