Failing Tests Meaningfully – TDD Process and The Karate Kid

Test-Driven Development’s “Red, Green, Refactor” mantra is a mnemonic shorthand for TDD’s development process.

Following Test-Driven Development, a developer first writes a
failing test, then makes the test pass, and then writes the functional
code that the assertions of the test are meant to lock in.

This is an overly simplistic view of TDD, but it’s a level of
understanding that many folks never really see beyond.  If you
stop at such a shallow understanding of TDD, you’ll probably end up
scratching your head and wondering what all the fuss is about. 
Again, “Red, Green, Refactor” is a mnemonic.  The richness of TDD
isn’t in the mnemonic, it’s in the details.

Here is the TDD development process at a glance as oft-quoted in the software development community at large:

1. Write a test

  • Think about how you would like the operation in your mind to appear in code
  • Invent the API you wish you had (in case you missed it, this is software design, not software testing)
  • Include all the elements in the story that you imagine will be necessary to calculate the right answers
  • Write the test to fail

2. Make it run

  • Quickly getting the test to pass dominates everything else
  • If a clean simple solution is obvious, type it in
  • If the clean, simple solution is obvious but will take a
    minute, make a note of it and get back to the main problem – making the
    test pass
  • Quick test success excuses all sins, but only for the moment…

3. Make it right

  • Now that the system is behaving, put the sinful ways of the recent past behind you
  • “Step back onto the straight and narrow path of software righteousness” (Beck)
  • Refactor (remove duplication)

“Red, Green, Refactor” is analogous to the The Karate Kid’s, “wax on, wax off”.

in the movie, “The karate Kid”, Mr. Miyagi drills some karate moves
into Daniel Larusso’s muscle memory by finding a way for Daniel to
repeat the movements without focusing on the fact that he’s “doing”
karate.  At the time, Daniel thinks that his new mentor is simply
taking advantage of him as free labor until Miyagi shows Daniel that
the “wax on, wax off” motions are in fact the motions for blocking an
attack.  Miyagi habituated Daniel to the motions by having him use
the motions to wax Miyagi’s small fleet of classic cars.

There’s more to “Red, Green, Refactor” than meets the eye, just as
there is more to “wax on, wax off” than meets the eye.  The
question is whether the petitioner has the discipline to drill the
practice until it is second nature.

TDD is a practice.  It’s something that requires some drilling
at the outset.  TDD isn’t a tool, and there’s no magic want that a
teacher can wave over your head to transfer practice and knowledge into
you.  TDD is software Kung Foo.  You’re not going to get it
by reading comic books or and sitting vegetatively on your ass in front
of an XBox.  On the upside, it’ll take much less time to become
proficient in TDD than to become proficient with Miagi Do Karate (not
considering the karate kid’s ability to become a champion fighter in
the spaces of a few weeks).

At first, exercising the TDD process will take up a lot of
attention.  Developers spend a lot of attention on keeping to the
test-first way of doing things, and little attention is spent on
becoming aware of the changes to software deign that occur by engaging
in test-first programming.

At this stage, TDD practice may seem as meaningful and fruitful as
waxing cars did to Daniel’s karate training.  You might be tempted
to quit, and it’s at this stage that most undisciplined developers
abandon their TDD practice and abandon themselves to their previous
pathetic programming proclivities.

Part of penetrating the seeming frivolousness of Test-Driven
Development’s practice is getting to an understanding of why we start
with a failing test, and follow up the failing test that doesn’t really
validate the code in any serious way.

It’s not valuable to simply accept the TDD dogma at face value and
expect that you’ll to keep at your practice faithfully until the “Red,
Green, Refactor” lights come on.  There’s a darn good reason for
starting with a failing test, moving on to a simple passing test, and
then getting down to writing real functional code.

Tests are instruments of measurement.  They measure the
correctness of functional code and prove that functional code has no
defects.  But there’s an inherent problem with tests… they are
made with the same raw material that functional code is made of and
subsequently they are subject to the same defects.  Put another
way, if you are using code to prove that some other code doesn’t have
any bugs, what’s to say that the test code itself doesn’t have any
bugs?  How do you prove that the test code is defect-free?

You can write a defective tests just as easily as you can write
defective code.  The tricky thing with defective code is that it’s
likely that you won’t simply catch the defect by reading the
code.  You know this to be true because you’ve often sat down to
fix a defect and stared at it for some time before you realize what the
defect is.  Defects can sit right under your nose and not even be
detected.  Your ability to catch a defect can depend on all kinds
of external influences – how much sleep you’ve had, how many
distractions are in your workspace, the composition of your new allergy
medication, etc.

It’s so easy to write defective code, and thus defective test code,
that TDD prescribes a disciplined process for writing tests that seeks
to insulate you from the human frailties that are often the root cause
of defective code to begin with, namely, programmer
self-overestimation, and inattentiveness or distraction.

Measurement instruments often have to be calibrated before they can
be used reliably.  My desktop scanner came with a white balance
sheet.  The scanner’s software can be calibrated to an objective
measurement of the color white by scanning the white balance
sheet.  Measurement instruments are calibrated to objective, known
states.  This calibration is what we’re doing by going through the
Red and Green phases of the TDD development cycle.

By starting with a failing test, we are calibrating our software
correctness measurement instrument, i.e.: our test, to a known failure
state.  This proves that the test detects an invalid state in the
software under test and correctly reports the failure.  We do this
in the most simple way, often just returning a hard-coded invalid
response from a method, or something equally simplistic.

Moving on to the green phase, we calibrate the test to a passing
state.  This proves that the measurement instrument can correctly
detect and report that the software under test is working within its
expected design parameters when it is functioning correctly.  As
with the failing test, we cause the test to pass with very simple code,
often returning a hard-coded valid response from a method.

We never calibrate tests using real functional code.  Real
functional code might pass or fail for reasons that aren’t
predictable.  For example, the functional code may have a
defect.  If you calibrate test code against defective functional
code, you haven’t actually calibrated you test at all.  In fact,
if you calibrate against defective functional code, you risk
introducing even more defects into your codebase since you will believe
that the calibrated test is in fact a reliable measurement instrument
when in fact, it won’t provide you with reliable measurements at
all.  I don’t calibrate the white balance of my scanner by feeding
it a page of my local newspaper since this would give the scanner
incorrect information to base its color offset algorithm on.

We need to start the refactoring phase of the TDD cycle with
reliable, calibrated tests because we are ultimately going to be
incrementally introducing small changes into the codebase and with each
small change we will execute the calibrated test to make sure that we
are still on track.  We follow this process whether we’re
introducing a new bit of code, or if we’re introducing a change to some
existing code.  The process is always the same – create an
unquestionably calibrated test, and then introduce the functional code
in small increments.

Many small increments of change are used because it’s easier to tell
at any stage whether the last bit of code you wrote broke the
code.  When a test fails, we fix the code immediately.  If
you introduce change in big chunks, it’s harder to figure out exactly
which new bit of code from that larger chuck of work is actually
responsible for breaking the code.  When you code in large chunks,
you end up spending more time coding as you’ll spend more time figuring
out what went wrong when something goes wrong.  Invariably, you’ll
need to use the debugger to figure this out. Developers with a TDD
practice spend much less time debugging code, and much more time
writing code and refactoring code toward its inherent, optimal design.

Debugging code is a slow, time consuming process.  Time spent
in a debugger is sloth time.  You might be thinking that you’re
perfectly effective in a debugger and that you don’t have any
objections to doing code validation in a debugger rather than in a
well-factored unit test.  This is merely an assumption fed by how
habituated you are to using a debugger.  Without having a TDD
practice, you have no basis of comparison for how ineffective debugging
is compared to writing well-factored unit tests for well-factored code.

Arriving at a solid understanding of why TDD insists on starting
with a failed test, and then making the test pass will go a long way to
helping you internalize the necessity of the TDD process.  The
unit test is the force driving the design of your functional code; or
the “factoring” of your functional code.  The unit test must be as
defect-proof a possible before we put our trust in it as the single
most influential aspect of our coding and detailed software design

Understanding why we calibrate the test doesn’t explain how to calibrate the test, and how to do so effectively.

You can still do things during the calibration of your test that by
and large don’t really serve to calibrate the test as much as they do
to provide you with an illusion of having a calibrated test.

For example, you can fail a test by throwing an exception from the
method under test.  If the method under test is never expected to
throw an exception, then this would be an entirely inappropriate way to
calibrate the  test.

The first part of the TDD cycle where a failing test is written
should be stated, “write a meaningfully failed test,” or “fail the test

If a method is expected to return an integer that is greater than 0,
then fail the test by having the method under test return -1. 
Returning -1 from the functional method is in keeping with actual
functional behavior that would be considered invalid and that the test
should detect and report as a failure.  Throwing an exception from
the method under test will indeed cause the test to fail, but not in a
way that is within the design expectations of the method, and therefore
the test won’t have actually been correctly calibrated toward detecting
a real invalid state.

With this in mind, here is a restatement of the first step of the
TDD process, slightly modified to disambiguate the knowledge that TDD
practitioners often take for granted:

1. Write a test

  • Think about how you would like the operation in your mind to appear in code
  • Invent the API you wish you had
  • Include all the elements in the story that you imagine will be necessary to calculate the right answers
  • Write the test to fail meaningfully


“Now use head for something other than target” – Kesuke Miyagi

This entry was posted in Behavior-Driven Design, Test-Driven Development. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

Leave a Reply