Overthrowing the Tyranny of the Shared Database

My team is making some structural improvements to plug some holes in our B2B messaging services.   The problem we’re starting to run into is a single database structure that is shared by at least a half-dozen applications, administrative applets, and assorted web services.  It’s not just the database structure itself, the stored procedures are also shared somewhat between applications (we think).  There does seem to be a bit of naming convention to identify the stored procedures by application, but I know that some of the stored procedures are shared by different applications.  It’s to the point where we’re somewhat afraid to make changes to the code or data in one place because of the risk of rippling effects to other pieces of code.  Houston, we have a problem (literally, our main office is in Houston).  Actually, I’d say that we have two major problems to deal with in order to make the family of systems easier and safer to change –


 



  1. Establishing some sort of configuration management of the database

  2. Eliminating code duplication and tight coupling

 


Configuration Management


 


So here’s our dilemma – where and how do you store the custom T-Sql code and DDL scripts?  How can you propagate database changes to other copies of the database?  How do you test database changes against the other applications to keep from breaking something else?  How do I know if I’m using the most current version of the database?


 


As far as we know there isn’t a “master” copy of the database structure somewhere.  To some degree we’ve relied on copying the database with Sql Server Enterprise Manager.  That’s not good enough in the long term.  One of the enablers of Continuous Integration is the “Single Source Point.”  There should be one and only one well known place where you go to get the latest, greatest version of the database structure.  Hopefully, there is also a reliable way to build the master database quickly before you do any serious development against the database structure.  I’ve been burned badly when doing integration to a shared database that had far too many structural differences between development, testing and production versions of the database.  I’m never going through that again.


 


My colleague is investigating creating a single Subversion repository for only the database structure and code.  It’ll also include a NAnt script to automate the setup of the database structure with a modicum of data.  I think we’re going to unilaterally declare this new repository to be the master copy of the database.  We’re hoping to use the multi-source configuration block in CC.Net to get the database setup and build integrated into all the projects that touch the shared database.  Any change to the shared database should result in an automated build of all the dependent parts.  Of course, we’ve got the little problem of not knowing all of the little one-off tools that rely on the shared database.  We also don’t have any kind of CI or automated testing infrastructure for any of the older projects.  One problem at a time is going to have to be the mantra for a while.


 


Configuration management of databases is not easy, and I doubt that very many shops do it well.  I’ve got a slide for database configuration management in our Continuous Integration presentation that basically says “Database CI is really important, but I don’t know the best way to do it — good luck to you.”  You simply cannot shirk the database configuration.  The database is part of the code and really needs to be versioned *with* the rest of the code.  I’ve often seen a drag on project efficiency as we dealt with “false” bugs that were just a result of the database version being out of synch with the middle tier version.  That problem goes away fast when the database is updated and rebuilt with the same automated build that produces the middle tier and user interface build products. 


 


It’s easy enough to have every single piece of Data Definition Language code run as part of the CI build, but how best to handle large amounts of test data?  You certainly can’t drop and recreate a production database, so how do you deal with making delta scripts to apply new changes to a shared database that is already in production?  These aren’t easy questions, and I really don’t think anybody has a definitive answer.  Pramod Sadalage has a good paper on evolutionary database techniques, but I think the database is one of the remaining frontiers in Agile development.


 


Shared Service Yes, Shared Database No


 


This is a perfect situation to apply Service Oriented Application (not sure about it being a web service yet though).  The longer term solution is to get an honest to goodness service layer in front of the database.  Any application, applet, or little administrative tool has to go through the service layer.  Soon as we establish the common service we can greatly reduce the surface area of the code that is intertwined with the database structure.  Add a comprehensive battery of automated FitNesse and unit tests against the new service layer, and we’ll be able to modify the behavior of the system with confidence.  Exposing the business workflow that is represented by the shared database will lower the cost of creating new client functionality because we won’t have to recreate the database access code.


 


 I strongly believe that business logic should be mated to data access code.  Interpreting the business meaning of data is business logic.  SQL “WHERE” clauses often contain meaningful business logic.  Consolidating the business logic that interprets this data and the data access code that reads the shared database will allow us to eliminate a lot of duplication.  Eliminating duplication makes a system easier and safer to maintain and modify.  At a minimum we need to get the administrative tools using the application service layer instead of the end around to the database so we can keep the two pieces synchronized.


 


How did we get here?


 


We’re certainly not unique.  I’ve seen several IT shops that were effectively held hostage by a shared database.  Having zero encapsulation around raw database data makes all of the applications sharing the database very strongly coupled.  Changes in one system will ripple into other applications far too frequently.  That’s an avoidable mess. 


 


The problem is that application integration has historically been difficult.  Many, if not most, applications don’t have a publicly accessible service strategy for easy integration.  Almost every enterprise software developer is comfortable writing low level data access code, so the temptation is always there to just go around an application into its underlying database for integration.  Opening up a database connection and grabbing or writing data is mechanically easy.  Like so many things in life doing the easy thing is very often the wrong thing.


 


I’m very frequently skeptical of much of the SOA and web service hype, but there is definitely some solid fundamental thought behind SOA.


 


 


How are you handling database configuration management?  I’d love to hear suggestions.

About Jeremy Miller

Jeremy is the Chief Software Architect at Dovetail Software, the coolest ISV in Austin. Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy is the author of the open source StructureMap tool for Dependency Injection with .Net, StoryTeller for supercharged acceptance testing in .Net, and one of the principal developers behind FubuMVC. Jeremy's thoughts on all things software can be found at The Shade Tree Developer at http://codebetter.com/jeremymiller.
This entry was posted in Continuous Integration, Database and Persistence. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Robert

    We had similar experiences working with Oracle DB. We established procedures for adopting SVN and automated scripts (that pull changes from SVN) in order to build
    patches.

    Please see http://www.scmsupport.com/scm.php?go=home.html and http://scmsupport.wordpress.com/ for more details.

  • Scott Whitmire

    The problem as Jeremy describes is not limited to databases shared among several applications. I have made a living designing, building, and support large applications that consist of subsystems that share a common database. We see the same issue among the subsystems, especially when each subsystem is designed and built by a different team.

    The solution that Jeremy proposes, with a service layer between the data and the rest of the world is a great starting point. It is, as near as I can tell, the only solution that effectively cuts the data itself off from the application unvierse that uses it. But, it’s only a first step.

    Every business has a galaxy of domain or business objects about which it is concerned. The object “customer” is a great example, since every knows (or thinks they know) what a customer is. In a given company, parts of the company have different views of the set of properties that a customer embodied in a software application might need. If you collect all of these points of view into one place, you get the requirements for a complete domain object that can represent a customer anywhere in the company. Do this same property collection for the other few dozen domain objects, and you have a galaxy of objects that can cooperate together to serve the needs of any application in the business.

    Each object in this galaxy would publish a set of services that represen the public operations it supports. The object itself (ok, it’s really a collection of objects in the software sense) would manage its own state and persistance. This notion takes SOA to a logical conclusion (I’m sure there is more than one) and provides some guidance as to the services that need to be published, and how they might be packaged and maintained.

    After the first application creates all of the maintenance functionality for the domain objects, subsequent applications might have to add a service or two, or maybe add some new properties, but we all know that 80% of the code related to a domain objects is basic maintenance.

    As you may have guessed, this is a problem I’ve struggled with for some number of years and some large number of applications. I think this solution will work in all cases, but it needs to be batted around the barroom for awhile. Have at it.

  • Jeremy D. Miller

    Christopher,

    From a configuration angle we just need to get to where you’re at, but through an automated CI build w/o DBA interference so you’re always synchronized.

    My biggest fear is really around the multiple chunks of code accessing, writing, and *interpreting* the same database. That’s duplication of business logic (and data access code for that matter). Duplicated code/logic == EVIL.

    Thanks for the comment — Jeremy

  • Christopher Wanko

    I don’t think I understand the problem as described. I’ll illustrate what we do, and perhaps you can tell me where we’re falling down or where the problem would be.

    Assume we have a production database. We have nightly exports of PROD_db, because we really need the data. We ship a copy of the export to another datacenter, compressed, nightly.

    Upon request, we will ask the DBA to import the PROD schema into a DEV schema. We now have a baseline upon which to develop.

    All code is managed through PVCS. Developers are told that if it isn’t checked in, it doesn’t count, and DEV should be able to be replaced with PROD at any time, and all DEV code can be applied to it to recreate DEV.

    Once DEV is complete, we apply PROD to STAGE. We then grab all the DEV code, apply it to STAGE. QA then gets a crack at it. Assuming it passes, we promote the code to TEST, then PROD.

    We have developers create their own units. We’re in the process of making developers stock a shared library, more of just a common storage area at this point of evolution. We always test code against our units and production data.

    Where are we weak, what in our system doesn’t help you, et. al.?

    -C

  • http://codebetter.com/blogs/steve.hebert Steve Hebert

    To Joshua’s point on SqlCompare, I think SqlCompare is a great product and something any database developer should have in their toolbox.

    Whether you can use SqlCompare’s script generation function depends on your project – if you can use it, it solves a lot of the things our approach dealt with. You have to temper that with how you upgrade servers, how many you upgrade and the complexity of upgrades. I’m sure there are other criteria you may want to consider, but regardless of you situation, IMHO SqlCompare fills a big gap when doing db development.

    -Steve

  • http://codebetter.com/blogs/steve.hebert Steve Hebert

    Jeremy,

    I’ve used the method of maintaining database objects (SPs, FNs, etc.) in a database project. We also included update scripts for changing the schema. Each update script was responsible for detecting if it needed to be applied and performed the upgrade when necessary (i.e. if field doesn’t exists, ALTER TABLE…).

    Each developer runs their own copy of the database and they can either use the database project to update their copy or grab the latest automated build’s copy. If someone wants a clean database, they go get it on a central server.

    The build machine referenced a tool I built to assemble the script according to a simple configuration XML document. I blogged the code here:

    http://codebetter.com/blogs/steve.hebert/archive/2004/11/11/31689.aspx

    The downside to using a database project is that VS2003 automation is flawed with that project type- I don’t know if VS2005 fixes the problem.

    Let me know if you have any questions.
    -Steve

  • http://flimflan.com/blog Joshua Flanagan

    I haven’t tried it myself (yet), but I wonder if something like SQL Compare (http://www.red-gate.com/products/sql_tools_overview.htm) would help out. When it comes time to do a deployment, compare your production database against your “single source” database, and then execute (and check in?) the resulting DDL scripts.

  • Jeremy D. Miller

    Sam,

    Thanks for the tip. I think it’s about time I quit being one of the unwashed and finally take a long look at Ruby on Rails.

  • Jeremy D. Miller

    Nathan,

    Hell no, that would definitely be a huge WTF. I watched a 125k a year “Architect” design a system that had a web service that accepted SQL statements. No business logic whatsoever, just using a web service instead of ADO or JDBC. Can you say “encapsulation violation?”

    By service layer I mean the Fowler version, with all of the business logic associated with this database behind the service layer in *one* place. My biggest piece of irritation with this whole mess is how much duplicated business logic exists that is associated with this data.

  • http://substantiality.net Sam

    You might find the Rails concept of “Migrations” interesting.

    In your case it would probably amount to adding extra database migration tasks to your build scripts, and then stamping a SchemaVersion table.

    It sounded like one of those “That’s just too simple to work” things when I first read about it, but it’s actually the best method I’ve run across to handle the problem.

    This lets you just run your build script, and as long as nobody’s been tinkering with the database outside of the build scripts, then just the migration tasks that need to run to get it to the latest version run.

    The Rails version has a little more to it, such as the ability to rollback a deployment, or revert or roll forward to a specific version.

  • Nathan Rivera

    You need to clarify something though. “Service layer” in front of the database does not mean build a single method that accepts/executes sql statements. A big WTF, I know, but many developers out there would build such a thing and claim its SOA.

  • zorkerman

    One way is to do a database backup right after deploying a release. Then have all additional changes to the database be stored in a directory with a little file, that tells you how to take the previous db and turn it into one that runs against the head.

    On a bigger system, you may find yourself having trouble simply checking in a db backup. A hundred gigabyte database may unnecessarily slow down more mundane checkins.