Where we are with acceptance testing and our BDD journey today

It’s a long post. Sorry about that. Get yourself a coffee
and a comfy chair.

I have spoken in the past about our path to Behavior-Driven Development (BDD)
through Story Test Driven Development (STDD) and Acceptance Test Driven Development (ATDD). Those
of you who have been paying attention will know that we essentially started
with a vanilla STDD approach to acceptance testing.

I wanted to talk a little bit about how we are authoring,
structuring, and organizing our tests today. Over the past couple of years I
have found that these are of greater concern than the technology you use. The
difference in technologies is perhaps mainly significant to the extent that
they help support authoring, structuring, or organization of story tests. I am
partially documenting this to help others who might find insight, but also to seek
feedback from others. I’m not trying to hold this up as exemplary. We certainly
have our weaknesses and failures which we need to address. However, I thought
some might find it useful to share my experience on this today.

From story to test

We use user stories to capture requirements. These do not
have enough detail to develop and are always a place holder for further
conversation prior to development.

Years ago, way before any testing TLAs, I delivered software
for primary care to the National Health Service (NHS, the UK’s public health
provider). Every year we received new requirements that we had to meet to get
our software accredited as providing the functionality someone in primary care
needed. We received this in two packages. One was a conventional list of
requirements; the other was a test pack. The test pack had a set of scenarios,
some end-to-end, some covering a specific feature that needed to pass before
you could get accreditation. Early we learnt that the test pack was the best
source of information on what we needed to build to be ‘done’. Requirements are
vague and open to interpretation. Scenarios are concrete, specific, and rarely
have interpretation. Ever get that moment where you built something, and then
had an argument with the person who wrote the requirement because they just
told you that your implementation is nothing like their requirements. Ever had
them point to some sentence that you overlooked, that changes the whole
perception of how you deliver the story. Scenarios avoid that problem because
they call attention to the correct behavior of the software. 

So the ideal is to capture the scenarios that the software
should support. Those give you the conditions for accepting the story as done.
Strictly acceptance tests for a user story are story tests.

Now the composition of your team will dictate to some extent
who is responsible for the parts of this process. In most cases I have found it
unlikely that the users of your software will be able to express the scenarios
though sometimes you might get acceptance criteria from them. In our case we
have domain experts who act as proxies for real customers, so we ask them to
define acceptance criteria – usually just a list – of things that a successful
implementation will represent. Our acceptance criteria looks a little like
this:

  • I
    expect that the Follow Up Date entered can be prior to the date the Follow
    Up Date is entered
  • I
    expect that a Follow Up Date can be changed
  • I
    expect that the Follow Up Date format should follow the format dd-MMM-yyyy
  • I
    expect to be able to ether enter a date or select from a calendar
  • I
    expect that activity history will record the previous Follow Up D

In our case we then have a separate QA function, who then
turn these acceptance criteria into scenarios that then need testing. We rarely
find that the end user looks at the test regardless of the medium used for
them. The idea that Fitnesse or Cucumber supports customer writing the tests is
only really true when you understand Customer in the XP sense as a role which
includes BAs, QA, etc..

In an ideal world we would get all of these scenarios
before beginning development, ideally by planning. That’s certainly the model
outline in FIT for Developing Software (which is a great
introduction to Story Test Driven Development)  In the actual world we get
some, but for others we both work from the acceptance criteria and communicate
about the scenario when we begin work on the story. The key is though that you
are driving with the tests either as criteria or fully-fledged scenarios. The
tests tell you what to build and when you are done.

Jim Shore refers to this communication as describe-demonstrate-develop.
The emphasis is on reached a shared understanding of what we should build and
communicate that over using a specific tool to test with.

Interestingly we occasionally fall down on getting scenarios
written and end up developing off the acceptance criteria. The result is all to
predictable. What gets developed is not quite right, or what was quite intended
and we end up with re-work. We certainly need to get better at this discipline.
A root cause analysis tends to suggest that what fails for us here is the
communication about what we should build. The storytest first approach is
forcing the correct behavior of understanding the story before we build it.

Scenarios are where you begin to see the value of
Given-When-Then as an organizing principle. Given I have an application in this
state, when I make this change, then the new state of the application should
be…

So for the above scenario we might write a scenario that
looks something like (don’t worry if you don’t get the domain):

Given that I have an Application subjectivity against a risk

with a Follow Up Date of 11-MAR-09

When I set a Follow Up Date of 24-MAR-09

Then the Subjectivity Follow Up Date should be 24-MAR-09

and the subjectivity should have an activity history record
of New Follow Up Date

with a Previous Follow Up Date of 11-MAR-09

What’s the difference between story and unit tests where I
have both?

In one sense there is no difference. You have a set of
acceptance criteria which you need to conform to. You have scenarios that
exercise these criteria. Each scenario sets up pre-conditions, exercises the
system under test (SUT) and confirms the post-conditions. Both story and unit
tests can confirm this. The difference is that the story checks the whole, but
I might wish to confirm a portion of the whole and that is when I need a unit
test. The unit test gives me greater defect localization. When the story test
fails I would hope that there would be a unit test failing somewhere that would
let me identify why the story test failed. If not I would look at the
interaction between the unit tested parts as being the source of the issue. So
a major difference between story and unit tests is granularity.

There is definitely some pressure here with folks
occasionally feeling that we are ‘testing twice’ once in the acceptance test
and once in the unit test. Most developers would prefer just to write the unit
test and not have to implement the acceptance test. I believe that acceptance
testing comes into its own for ensuring we have developed the right work and
during large-scale refactoring.

However, I think this feeling, that the acceptance tests are
not adding value, occurs most often when you slide into post-implementation
Fitnesse testing because your scenarios were not ready. A lot of the value-add
comes from defining and understanding the scenario up-front. Much of the value
in acceptance testing comes from agreeing what ‘done” means. That is often
less clear than people think and defining a test usually reveals a host of
disagreements and tacit assumptions.  When we find ourselves questioning
the Fitnesse tests it is often a symptom that we have slipped into ‘test after’.
Interestingly when Jim Shore posts about stopping acceptance testing, his alternative
seems to be exactly the kind of example-driven approach that we would consider
storytests to produce. Overall I think the direction people are heading is the
same – focus on scenarios that can be automated by developers.  Whenever
we slide on acceptance tests and drift toward starting with unit tests we often
find ourselves in re-work because we end up building the wrong thing. The unit
tests mean the wrong thing works, but it’s still the wrong thing. [More than
that, it's just plan helpful. As I write this the acceptance tests are
prompting me to make an aspect of the current story go red that I had forgotten
about earlier; more proff that acceptance tests help you to navigate towards
done].

It is hard to write unit tests as part of the definition of
the story. You don’t necessarily know how you will build the story at planning
or definition. The classes that implement the functionality probably don’t
exist, and you cannot exercise them from a test fixture at that point. Tools
like Cucumber and Fitnesse de-couple authoring a test from implementing the
test fixture for that test. This enables you to author the scenario in your
test automation tool, before you begin implementing the fixture that exercises
the SUT. This is the key to enabling a test-first approach where you need to
communicate the scenario for confirmation prior to commencing work, such as
planning. Tools like Fitnesse
or Cucumber
is to provide an easy mechanism to communicate your scenarios.  We find
that Fitnesse allows customer and developer to communicate about the scenario
before implementation. You can also see the results of executing it.

Because the conversation is more important than the tool, a
tool like Fitnesse or Cucumber may not be appropriate to for the acceptance
test. The most likely case is that you do not have any business rules to put
under test. Don’t feel constrained to use a particular tool. The most important
thing is to define the acceptance test and then decide what tool is appopriate
to automate the test. We even have some cases where the value of automation is
too low. Where we are configuring the system for example, it may be easier to
test the rules that work with the configured system than test the act of
configuring the system itself.

One interesting aspect not really considered here though is
that the coverage provided by the more granular unit tests is still something
of a ‘blind spot’. I’m still not sure how to resolve that, or if the customer
is genuinely concerned at that level anyway. But acceptance testing is as much about communicating what
is acceptance intent as ‘testing’
. Hence we separate test and
fixture.

In addition another difference for us is that our acceptance
tests may be an be end-to-end tests of the system. There is a trade off here.
Our acceptance tests need to run within a sufficiently short time window that
developers will exercise them. End-to-end tests often end up expensive in time
to run, so folks stop running our tests. So we sometimes mock ‘expensive’
calls. Using an IoC container can help here, and we have some common code that
allows us to choose which services to mock out when setting up a new acceptance
test fixture. End-to-end tests also create a web of dependencies that may fail
for setup reasons. That can become painful in configuration heavy systems. At
the same time if the tests cannot simply set up and tear down, that may itself
be a smell that you need more effort spent on automatic build and deployment of
the system.  Going end-to-end here does gives us confidence that the
system works when the parts are hooked together. While we started with a lot of
end-to-end tests we seem to moving away from them toward more tests with
‘expensive’ collaborators mocked out. One trick here may be to seperate between
those acceptance tests that exercise  business rules, which may make
heavier use of mocks for components that do not influence the test and a few
tracer bullet tests that ensure the whole is working together.

Testing business rules not workflow

One piece of advice from Rick Mugridge in Fit for
Developing Software
is to test the business rules, not the workflow. Early
on we made a lot of mistakes with this. We tended to have verbose column
fixtures setting every single property on each of the entities involved in the
test. Part of the driving force for us on this road was that our system is
configuration heavy. So to use the system you need to set up products, set up user-defined
fields, set-up rules etc. This gives the system significant flexibility but
creates an overhead to working with it. Thatoverhead meant that when we
authored many early tests we  needed to set up a lot of configuration
data. This was too much work to write every time, and we quickly appreciated
that we needed a better strategy.

Our first approaches to this problem involved shared set up
for our fixtures. However, given sufficient properties to set this resolution
is still very fragile. Worse as refactoring tools do not refactor into the text
based fixture they have a tendency to break every time you refactor. Tests
should enable refactoring not become an obstacle to them. Any time the team
stops refactoring because of the rigidity to the tests, you have a quick way to
end any team’s love affair with storytest approaches.

So we needed a better way to factor out all of that setup
code. Our next approach was to provide tooling to allow the system to be setup
into a known state. Essentially we wrote support for exporting setup from a
system as XML and for importing it. This could then be used as part of the
setup of a scenario to put the system into a known state. This also had
benefits for deployment as it let us automate configuration of environments and
import/export records between systems. So, as is often the case, making it
simpler to test also gave us architectural improvements.

However, our biggest win probably came from using test data
builders to configure the system.

Omitting all that setup code

Some time ago I wrote about how we switched from using Object Mother to Test Data Builder.
That gave us enormous benefits within the domain model, because it made it easy
to do state based testing. It is also very expressive, because I can highlight the variables that impact that test.
This makes the test much easier to read. You can think of this as using a stub
for those values you do not need to control and overriding those that you do
need to control for the test. So if we want to set up a new entity we might
have some code like the following:

           
submission = new SubmissionBuilder()
               
.WithBrokerReference(new BrokerReference {BrokerName = “Flintstones”,
BranchName = “Flintstones – WA”})
               
.WithMarket(new Market(“Small Business”))
               
.Build();

Many acceptance tests can be handled in a similar fashion -
use defaults for anything in the setup of the fixture that does not have an
impact on what is under test. Essentially we choose to treat the Given
statements in the same way we handle our test data builders. The given
statements call out specific variables that effect the test, but otherwise rely
on default ‘stub’ values produced by an underlying builder. So the equivalent
of the above code on an acceptance test would be something like

Given a submission

With a broker named Flintstones and a branch named
Flintstones – WA

With a market of Small Business

When…

Then…

The advantage here is that we get a good signal to noise
ration when setting up the fixture to use with our test. We don’t just stub out
values that the test does not depend on, we highlight those that it does depend
on. In the fixture implementation we use the builder pattern to implement the
Given statements for the fixture. We tend to be able to re-use these builders
across a number of tests. Our fixture implementation for Fitnesse looks
something like:

public void GivenASubmission()
{
    submissionBuilder = new SubmissionPersistentBuilder(ioc);
    if (lastCreatedAccount != null)
    {
       
submissionBuilder.WithAccount(lastCreatedAccount);
    }
}

public void WhenTheBrokerNameIs(string brokerName)
{
    submissionBuilder.WithBrokerName(brokerName);
}

public void WhenTheBrokerBranchNameIs(string branchName)
{
    submissionBuilder.WithBrokerBranchName(branchName);
}

public void ThenASubmissionIsCreated()
{
    bool result;
    lastCreatedSubmission = submissionBuilder.Build();
}

Note that we tend to have an extra Then step to build all
the Given and With statements, before we exercise the SUT. The issue here is
that we wanted to have a statement that says, build the above, so that we could
create a base flow fixture that would allow re-use in derived fixtures of much
of the setup code.

Given a submission

With a broker named Flintstones and a branch named
Flintstones – WA

With a market of Small Business

Then a submission is created

When…

Then…

This is also the reason for often checking for Last Built
Foo – to see if a prior build step in the test constructed the object. If you
were only looking to re-use at the builder and not the fixture level, then this
would not be an issue to you. We could probably clean this syntax up and I
guess that is on the to do list. The advantage of some re-use here is the speed
boost it gives you getting the Given portion of your fixture working, letting
you get to the red part of red-green-refactor all the sooner. That means you
can get past the grunt work and into the interesting part – implementing the
desired functionality.

It is worth noting that sometimes you get a parameterized
test, in that you want to test the same scenario repeatedly, but for a range of
values. A column based test-fixture can work really well here – columns either
represent the variable inputs or the outputs. Just because we tend to use the
Given-When-Then style does not mean we should not use opportunity to keep the
intent (Setup, Exercise, Verify, Teardown) without
becoming overly verbose.

How do I find my acceptance criteria in future?

One problem with acceptance tests is how to organize them.
When we started we organized them by story. Find the story, find its acceptance
tests. The stories were organized by iteration. Twenty-seven sprints in that
early plan looks naive. We have too many stories and sprints to find our
acceptance tests by that organizing principle.

The first problem is that I often want to add to an existing
scenario for new functional requirements instead of always creating a new one.
This becomes hard if you couple story and acceptance tests, because you either
need to duplicate or add new acceptance criteria to the old story.

The second problem is that as an executable specification of
the code our tests have a lot of value, but not much if we cannot find it. We
should be able to find from our tests what the behavior of the system is.

Our current solution is to organize the acceptance tests
along the same lines as the site. For a given webpage on the site, we have a
given page in our Fitnesse wiki that tells you what the scenarios supported by
that page are. The mapping cannot be entirely one-to-one in that sometimes we
pop up a dialog, or perform some other workflow that we are not interested in
simulating in business rule focused tests. But it is close enough to make finding
the definition of the behavior easy.

The conceit here is that you could have the site open on one
screen and the acceptance tests open in another and have effective and
executable documentation for the behavior of that page. It is a little bit
conceited, but it works well enough to be a useful organizing principle.

Old tests, cleaning up, and throwing out

Some of our acceptance tests get bent out-of-shape by
incremental change to the system. The best answer is often to throw them away
and start again. One advantage of the ‘reflect the site’ approach above is that
it gets easier to see when to re-write. With acceptance tests even more than
unit tests it often seems to be better, to understand intent and re-write to reflect
the new state of the site than try to fix up old tests.

But what about the User Interface!

One obvious point here is that no user interface has been
defined as yet. One obvious problem is that testing through the UI depends on
the UI being completed to define tests. However, it can be worth thinking about
mocking up the UI at the same time as you write the scenarios using a tool like
Balsamiq.

We had a struggle at the beginning of the project over responsibility for defining
how this should look before realizing that it needed to be co-operative. In the
end it all comes back to the Agile emphasis on communication. The Customer asks
for features, but the Developers need to negotiate how the features will be
implemented inputting their understanding of how hard or easy something is, or
what opportunities or approaches present themselves.

The difficulty with many
traditional requirements documents is that they focus on the UI as a mechanism
for defining the requirements. In essence the UI and its behavior becomes the
requirement. By separating out acceptance testing we begin the journey of separating
what we want from how it will be implemented. That gives us more chance to
isolate the business rules. Once we understand what the user is trying to
achieve it is easier to discuss how we will achieve it. That tends to lead us
away with simply providing an electronic version of paper or legacy system
process they may use today, toward process they can use tomorrow.

Some folks still like to work from the UI down, whereas I want them to work from the acceptance tests on down, then hook up to the UI. It’s probably all tilting at windmills at some point, but for me whilst both are essential to getting it right, I see more cost in the majority of cases to the rework for not getting the business rule right than to changing the UI.

 

 

 

About Ian Cooper

Ian Cooper has over 18 years of experience delivering Microsoft platform solutions in government, healthcare, and finance. During that time he has worked for the DTi, Reuters, Sungard, Misys and Beazley delivering everything from bespoke enterpise solutions to 'shrink-wrapped' products to thousands of customers. Ian is a passionate exponent of the benefits of OO and Agile. He is test-infected and contagious. When he is not writing C# code he is also the and founder of the London .NET user group. http://www.dnug.org.uk
This entry was posted in ATDD, BDD, Behavior Specification, Fakes, STDD, TDD, xUnit. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://www.oklaptopbattery.com Acer Laptop Battery

    The first problem is that I often want to add to an existing scenario for new functional requirements instead of always creating a new one. This becomes hard if you couple story and acceptance tests, because you either need to duplicate or add new acceptance criteria to the old story. very good, i am agree with you

  • http://www.oklaptopbattery.com Acer Laptop Battery

    It should be possible to create a broker submission instead of the GWT steps

  • http://www.laptop-battery-accessories.com/acer-laptop-battery-c-78.html Acer Laptop Battery

    very good article, thanks very much. fighting

  • http://codebetter.com/members/Ian-Cooper/default.aspx Ian Cooper

    @Mike

    Agreed. GWT is mainly useful to me as insight on how to capture a scenario using a template that drives certain behaviour through its language. But when actually implementing the test you need to watch for parameterization etc.

  • http://codebetter.com/members/Ian-Cooper/default.aspx Ian Cooper

    @Jonathan

    Not sure, I have never really seen much interest in ‘sign off’ with anything other than ‘working software’

  • http://codebetter.com/members/Ian-Cooper/default.aspx Ian Cooper

    @Richard

    For me Acceptance Tests are about making sure you ‘build the right thing’ over ‘built it right’ so you may not always be end-to-end. There is a need to demonstrate the whole thing hooks together, via exploratory testing or the like, but its not really the sweet sport for acceptance testing. The issue is the cost of full automated end-to-end tests.

  • http://codebetter.com/members/Ian-Cooper/default.aspx Ian Cooper

    @Paul I agree that using acceptance tests to describe UI interaction can be an anti-pattern for text based fixture tools like Fitnesse. The value of those tools is in testing the business rules not the UI workflow. That said it is possible to create ‘acceptance tests’ for UI even if they are exercised with a different tool like Selenium or a manual test script. The advantage of thinking about a test script is that it can be written before the UI is finalized and forms a definition of expected behavior even if later automated. The issue with UI automation tools is that they are always test-after and may just bake in automation of the wrong functionality.

  • http://www.Lanceearner.com Lanceearner

    Lanceearner is a web-based marketplace that connects businesses in need of computer programming expertise with a global, freelance market of programmers. It also connects businesses and freelancers in the areas of graphic design, writing, translation and numerous other services.

    In the past, businesses had to rely on whatever helps was available locally. This resulted in unpredictable quality and higher than necessary costs. Lanceearner solves this problem by providing a vast well of over 200,000 professionals from across the globe. Any sized business can now choose from a wealth of experience levels, price points and services to find the right professional to serve it at any given moment. Companies may choose to work domestically (which is ideal for communication and intellectual property protection) or offshore (which is ideal for cost savings).

    Click here for sign up on http://www.Lanceearner.com

  • http://thesoftwaresimpleton.blogspot.com/ Paul Cowan

    My problem is that with AT, you are often trying to describe a UI application in text.

    The end customer will much better grasp a UI than a lot of textual assertions. You get sign off with a lot of ambiguity still surrounding the application.

    A wireframed application leaves little room for confusion.

    AT seems more like a developer tool than a means of commumication with the customer.

  • mike

    good stuff …

    i’m not trying to fit the ATs into the GWT template nowadays … at some point GWT was forcing me to give up in communication over reuse of steps

    i prefer a declarative way to express the tests over imperative … eg . “It should be possible to create a broker submission” instead of the GWT steps

    haven’t thought of the concept of mocking out the ‘expensive’ collaborators … but haven’t been in the situation that it would improve the speed over trust of coverage

    some other times i’ve had monolithic systems n’ had to test through the ui … lots of pain, nothing to mock there :(

    been grouping tests by feature groups lately … they follow business breakdown of app

    cheers

  • http://tartley.com Jonathan Hartley

    I haven’t used this personally, but if I were ever to return to any sort of enterprise / consultancy development, then I imagine that one extra benefit of acceptance tests as opposed to unit tests is, as their name suggests, in negotiating acceptance with the client. If you can prove at a moment’s notice that the system does what they asked for, and the traceabilty from those proofs to the original specification is trivial (ie. an identity mapping) then you’ve gone a long way towards getting a recalcitrant customer to sign off on the delivery.

    Of course, having a recalcitrant customer in the first place is indicative of bigger problems, but still, I’d rather have the acceptance tests in this scenario than not have them.

  • http://richarddingwall.name Richard Dingwall

    With unit tests, integration tests and acceptance tests, I always considered acceptance tests to include the UI and working DB (i.e. full end to end test using some UI automation tool), because user acceptance requires an interface (the business isn’t going to accept a code library, which is all you are effectively testing here). Thoughts?