I’ll be back on LINQ architecture after the holidays, but in the meantime, I wanted to share some of the bad, some of the places where we have had bitter experiences.
When NMock first appeared we embraced the behaviour verification style that it supported. We liked the idea that for ‘unit tests’ we should not interact with other concrete classes. We liked the way that mocks had to derive from interfaces, abstract classes, or override virtual methods. We wanted to depend upon abstractions, not details, and we liked the way that mocks gave us an emergent design that exhibited this quality.
One of our first pushes was against slow tests that talked to the Db. Writing our tests against any kind of shared fixture was painful (note that in NUnit, MbUNit, MSTest et al. class variables are shared state for all tests in the fixture; xUnit tries to fix this but writing tests against shared state in the Db was especially painful. Either tests influenced each other, or we wrote complex setup and tear down code. Even using tricks like OleTx transactions we still had to pre-populate the Db or setup everything each time. And the tests were slow…
So we mocked out our DataMappers and, freed from the dependency on the Db, testing our Domain proceeded like a dream.
The project inherited a byzantine legacy Db schema that was not amenable to an mapping via an ORM tool (at least at the state-of-the-art for that time), so we had to roll our own DataMappers. Due to the limited re-use options outside this context, instead of rolling our own reflection and generics approach, with its attendant complexity, we opted to aim for a solution that we could eventually just code-gen. Keen to build via TDD, then code-generate once we were sure it worked, we wanted to drive development of our DataMappers via TDD. Inferring from our wins with mocks in the domain, we expected to be able to gain similar benefits by mocking out our DataMappers interaction with the Db (mea culpa).
So, to persist, we created an abstraction of the Db, an IDatabase. Then mocked that IDatabase in our unit test and created expectations of the behaviour that of our mappers, by expecting calls to the Db to execute sql. This enabled us to check that we created the stored procedure calls we would expect. To create parameter lists for those procedures we created classes (which we intended to eventually generate) that mapped domain objects into SQL parameters.
To materialize objects back out we created classes (which we would again auto-generate) that gave us the ordinals needed to read the fields from the row corresponding to the class. We used a dependant mapping strategy, so that our domain worked with a DataMapper for a root class but we loaded any child entities and value objects.
Of course, we still needed integration tests to see if they would actually work, and of course, we found that some of the SQL code we were generating passed unit tests, but failed when run against the Db, but overall we were pretty proud of how we were testing the Db. There were some warnings (and one member of the team expressed doubts) but it seemed to go pretty well. As an aside, as we were only mapping out a few classes, the pressure to code generate never hit us. Believing that we needed to build 2 or 3 mappers before we would see our mapper design emerge (by refactoring to remove duplication) we never reached the point where we needed to push on from there to code generation to complete our persistence requirements. Simply put, we did not need to persist enough items that the cost of writing code generation became less than the cost of implementing the remaining mappers by hand.
The problems began to hit us in maintenance (and that can hit quite early on an agile project with frequent releases). We had a number of issues:
Mocking tools were not strongly-typed
At that time the mocking tools just used strings, there was none of the record-and-replay style seen within tools like Rhino Mocks. This is not only important because the compiler can no longer help you find errors, but because refactoring tools stop helping you make changes when a string is the method call. So Rename Method to express intent, became search-and-replace. Unit tests would pass, but integration tests failed, because the names had changed.
The test code was over-coupled
A lot of our test code contained a dozen lines of set up for mocking along the lines of:
database.Expect(“AddInParameter”, dbCommand.MockInstance, “@Username”, DbType.String, person.UserName);
database.Expect(“AddInParameter”, dbCommand.MockInstance, “@FirstName”, DbType.String, person.FirstName);
database.Expect(“AddInParameter”, dbCommand.MockInstance, “@MiddleInitial”, DbType.String, person.MiddleInitial);
database.Expect(“AddInParameter”, dbCommand.MockInstance, “@Surname”, DbType.String, person.Surname);
The tests are coupled not only to the domain model but the schema, and represented a point of resistance to change for us.
Tests predicted the implementation rather than letting it evolve through refactoring
The tests specified the implementation and as such writing the test constrained the implementation. This broke the more normal TDD pattern of make it pass then refactor. Writing the specification for the implementation in the tests is expensive, and error prone.
Changes to implementation were Shotgun Surgery
When you change the implementation of a method under test, mocks can break because you now make additional or different calls to the dependent component that is being mocked. For us, if you needed to add an extra parameter to a domain class under our model for example, you had to create the expectation for that parameter in the test. After a while the process of adding a new field became expensive, and the number of changes required to add a new method began to smell of shotgun surgery. The trouble had become that our tests not only specified the inputs and outputs but also how the method under test was implemented: the order and number of calls.
The mocks began to make our software more resistant to change, more sluggish, and this increased the cost to refactoring. As change becomes more expensive, we risked becoming resistant to making it, and we risk starting to build technical debt. A couple of times the tests broke, as developers changed the domain, or changed how we were doing persistence, without changing the test first, because they were frustrated at how it slowed their development. The mocks became an impedance to progress.
Mocks had become, for us, fragile tests.
Red, Green, Refactor
Agile methodologies allow a just-in-time design approach because you can refactor existing code at low-cost and risk. Unit tests enable this scenario, because they protect against changes in behaviour of the system under test. You can change the implementation, provided the behaviour remains the same. However when mocks are fragile, and risk becoming an obstacle to change, because they can break even when the behaviour remains consistent they can increase the cost of refactoring.
Maybe we should have just done integration testing?
By contrast the effort to check the classes using integration tests turned out to be quite small in this instance, because we only needed to check our ability insert, update, and delete on each mapper. We had gained a lot by removing our dependency on the Db from the domain with the DataMapper, but our desire to mock out the Db on the implementation of the DataMapper, looked as though it cost us more than it saved. It was a bridge too far.
Fragility and Mocks
When I look around now, I see a lot of people using mocks to replace all their dependencies. My concern is that they will begin to hit the Fragile Test issues that mocks present. Gerard Meszaros identifies the issues we hit as two specific smells: Overspecified Software and Behaviour Sensitivity.
Gerard Meszaros classifies any object we use to stand in for another object during a test as a Test Double. It is worth reading what Gerard has to say either on the web site or in his book. The key is to understand that you are replacing a dependency to isolate the object under test from either Indirect Inputs or Indirect Outputs. Mocks are really only a sweet spot for testing indirect outputs. If you have indirect inputs, a Test Stub or Fake Object may be a more maintainable approach than a mock. Even for indirect ouputs it is worth considering a Test Spy (we find the Self-Shunt variation particulary simple to use) or Test-Specific Subclass before looking at a mock.
Switching to fakes and stubs
Since that project we have weaned ourselves from our mock dependency and try to use what Fowler calls a classicist approach to TDD more. Where we do replace a depended upon component, we try to use the appropriate technique, depending on whether our concern is an indirect input or an indirect output in the dependency. In addition when talking to the outside world we weigh up the point at which the ‘last mile’ should be checked with an intergration test over a unit test. So while I want to isolate my domain, I may make different jhdgements in the service layer. Mocking frameworks are powerful, but ‘with great power comes great responsibility’.