Jeremy D. Miller -- The Shade Tree Developer

Sponsors

The Lounge

News

Advertisement

Images in this post missing? We recently lost them in a site migration. We're working to restore these as you read this. Should you need an image in an emergency, please contact us at imagehelp@codebetter.com
Database per Developer and Environment

Our Continuous Integration talk Monday got a bit rushed at the end and I missed an important point when we talked about CI with databases.  For the love of all that is good and holy in the world create a separate database/schema for:

  • Each development workstation
  • Build environment
  • Test environment 
  • A "Sheep" database for the Business Analysts (baaaa) or Product Manager to use for ongoing demos to the customer

Running automated tests against a database is a complete joke unless the database is in a completely known state.  There are some "hacky" alternatives, but the easiest alternative by far is a completely isolated database with only one user.  Think about going to a hospital and seeing them reuse needles.  A shared development database should make you recoil in horror in much the same way.

"Jeremy, everybody already knows that and it's been a best practice since before you were born."  I didn't know that a couple of years ago on my first CI/TDD project and I know there are people out there who haven't thought about this issue at all.  I've been a pair of projects now where we attempted to run automated testing against shared databases (in both cases it was organizational stupidity and bureaucracy in Fortune 100 internal IT departments).  It was excruciating.  An untrustworthy automated build is completely useless.  If your initial reaction to the build busting is "Oh, it's probably a database conflict.  I'll just force the build again" -- your investment in CI has been squandered.   

It's also more fuel on the fire for being able to test the majority of your code without the database being involved.  You'll also need the ability to automatically create a database structure from scratch inside the build to pull off the separate database scheme.

 


Posted 09-21-2005 4:03 PM by Jeremy D. Miller

[Advertisement]

Comments

Carlton wrote re: Database per Developer and Environment
on 09-21-2005 3:44 PM
I COMPLETELY agree with your sentiments. I have worked on three projects in the last year where this concept was not readily accepted by co-workers (or management) as a "best practice". Not sure how you convince people there is a better way since they have not experienced working in a near frictionless environment - most environments I have encountered have VERY poor development infrastructure. I have observed if the procedures to set-up a developer environment is poor, then that group has problems deploying their software. I truly wonder why people do not complain about these issues sooner - I guess it has hurt for so long, they have become numb to the pain.

What is most sad is it does not have to be this way. A little time spent working out the infrastructure - NAnt build, automated builds with CC.Net and DB scripts - helps improve the software and increases the team's productivity. Forget trying to teach people pair programming, refactoring or whatever Agile technique that takes people out of their comfort zone. If you want the most bang for your buck and IMMEDIATE results, improve the build and make it easy to release code. This stuff is such low hanging fruit and fixing these things builds the team's (and management's) confidence.
Sahil Malik wrote re: Database per Developer and Environment
on 09-21-2005 5:10 PM
There is one issue though. What if the development database turns out to be 40 gigs? Then it just isn't practical enough to have it on each development workstation.

Then what?
Carlton wrote re: Database per Developer and Environment
on 09-21-2005 5:31 PM
Just a question - why is the development database, i.e. the coder's sandbox, so large? I would think that a sandbox environment would be pretty small since you cannot generate 20 GB worth of data, especially if you are dropping it all the time.

Is the development database pulling data out of the production environment? If so, maybe that data should go to staging DB and not development DB (what I consider the integration\automated build environment)? IME, staging should mirror real world more than the developer's sandbox. The sandbox is for bashing the code the way you want it to and should be light.
Sahil Malik wrote re: Database per Developer and Environment
on 09-22-2005 2:19 AM
Carlton,

There are indeed situations where a dev database could get so large.

There are two kinds of data, seed data, and non seed data. Seed data typically refers to as admin data, i.e. application setup data - a good example is US Zipcodes. Now in many cases you may not be able to restrict what day you may be able to exclude out of the dev database.

Of course, the right answer is "Just put the seed data only in a different shared db".

Well the issue is

a) Foreign keys
b) Transactions between dbs.
and
c) Well the last developer just didn't do it, nor was he good at maintaining data scrubbing scripts - and I just inherited this damn system, and I need to get this feature in .. by TOMMOROW.

Then what? :). Real world scenario - you are stuck with development databases that are 40 gigs. Now what?
DevPrime wrote re: Database per Developer and Environment
on 09-22-2005 11:09 AM
There's another valid statement to go with the 40 GB database - writing good tests is something devs don't often do, even in a TDD environment. If you expect a large database or complex queries, then you are also testing performance to a large scale (or SHOULD be). Unacceptable performance is a FAILED test. You can't test for these things on a tiny subset of a database. Unfortunately, while TDD may lead developers to water, it can't make them drink - in other words, devs typically don't write good tests just because they're using TDD.
Carlton wrote re: Database per Developer and Environment
on 09-22-2005 12:19 PM
So what I hear you are saying is you MUST test against a 40 GB database and you need to get it to work yesterday. Then my only advice is make it work in the environment it needs to and worry about the sandbox later.

I once worked in a situation like you described, massive foreign keys, transactions between DB, etc., etc., and I can tell you that it will be many months until you can tease apart all the databases and automate them. In the situation I observed, you will have to work hand-in-hand with the database group (IMO, things only get this bad when you add dedicated DB's in the mix) and the testers to create an independent test environment. No one will want to do it and management will not want to spend the resources to fix the broken infrastructure.

As for performance, I don't think performance should be a goal for TDD. Performance is an important goal, but it is not what a developer should be focused on in the initial TDD - we are writing unit tests in TDD, so by definition we are not talking to outside systems. So where does performance fit in in the mix? I would suggest at the integration level or even perhaps at the customer acceptance test level. Depends on who wants it.
Jeremy D. Miller wrote re: Database per Developer and Environment
on 09-22-2005 2:49 PM
Sahil,

A.) You very likely *don't* need the entire 40GB database for testing

B.) In that exact situation, we just used an Oracle Synonym for the large static tables. That's always an option if the data is read-only.

"Database per developer" doesn't necessarily mandate a separate physical server, only database independence. You always have to compromise. The real goal is to have reliable database testing that doesn't give lots of false negatives because of corrupted database states from simultaneous testing runs.
Jeremy D. Miller wrote re: Database per Developer and Environment
on 09-22-2005 2:55 PM
DevPrime-

Dude, I think you're more than a little off the mark on all accounts. When I'm testing that the intended functionality works I want fast tests. It's brain dead stupid to run *those* tests against a massive database. You cannot do TDD effectively without rapid feedback.

Performance tests are a completely different animal. Of course you run those on a realistic (or 2-3x larger) sized database. Keep this in mind, when you're making coding changes for performance optimization, don't you want to make sure you don't cause new functional defects? Fast unit, integration, and acceptance tests gives you an ability to safely make performance optimizations. All those reasons that we do TDD and CI to make a code base safer to change and evolve also make it safer to optimize.
Sahil Malik wrote re: Database per Developer and Environment
on 09-24-2005 2:16 AM
Jeremy,

There are times when you *need* the 40 gigs dev. database for your app to even run :), much less test.

I know this may sound as a result of "Just scrub the data", but consider this (and this is a real world example out of my own development experience).

An insurance rating engine, which rates auto policies - it needs zipcode data, which is huge, it needs vin#, which is huge, and it needs location data, which is huge, plus other such stuff. You need those tables, in entirety to ensure your tests run.

What we did over there was, yes, the dev database was a shared db, admin (seed) data was in a seperate schema, and developers had their own schemas.

The problem is, that doesn't work with SQL Server. :), and that is a big pain in the arse. (You can't split data into databases, and expect multi db FKs, and Transactions to work properly).

Frankly I don't have a good solution to that much too common problem, so I was wondering if you did. "Have a small DB ain't the solution".

- SM
Jeremy D. Miller -- The Shade Tree Developer wrote Best of the Shade Tree Developer (with actual links!)
on 08-07-2006 4:50 PM
Between being extremely short handed at work, tech' reviewing a new book, a
possible book proposal...
Jeremy D. Miller -- The Shade Tree Developer wrote Best of the Shade Tree Developer (with actual links!)
on 09-01-2006 2:32 PM

Between being extremely short handed at work, tech' reviewing a new book, a possible book proposal

Carmelo Lisciotto wrote re: Database per Developer and Environment
on 07-05-2007 9:42 AM

Nicely prepared article!

Carmelo Lisciotto

www.carmelolisciotto.com

Jeremy D. Miller -- The Shade Tree Developer wrote Resources from my DevTeach talks
on 11-29-2007 12:03 PM

To everybody that attended one of my talks at DevTeach this week. All of the materials are now online

Add a Comment

(required)  
(optional)
(required)  
Remember Me?