My team is making some structural improvements to plug some holes in our B2B messaging services. The problem we’re starting to run into is a single database structure that is shared by at least a half-dozen applications, administrative applets, and assorted web services. It’s not just the database structure itself, the stored procedures are also shared somewhat between applications (we think). There does seem to be a bit of naming convention to identify the stored procedures by application, but I know that some of the stored procedures are shared by different applications. It’s to the point where we’re somewhat afraid to make changes to the code or data in one place because of the risk of rippling effects to other pieces of code. Houston, we have a problem (literally, our main office is in Houston). Actually, I’d say that we have two major problems to deal with in order to make the family of systems easier and safer to change –
- Establishing some sort of configuration management of the database
- Eliminating code duplication and tight coupling
Configuration Management
So here’s our dilemma – where and how do you store the custom T-Sql code and DDL scripts? How can you propagate database changes to other copies of the database? How do you test database changes against the other applications to keep from breaking something else? How do I know if I’m using the most current version of the database?
As far as we know there isn’t a “master” copy of the database structure somewhere. To some degree we’ve relied on copying the database with Sql Server Enterprise Manager. That’s not good enough in the long term. One of the enablers of Continuous Integration is the “Single Source Point.” There should be one and only one well known place where you go to get the latest, greatest version of the database structure. Hopefully, there is also a reliable way to build the master database quickly before you do any serious development against the database structure. I’ve been burned badly when doing integration to a shared database that had far too many structural differences between development, testing and production versions of the database. I’m never going through that again.
My colleague is investigating creating a single Subversion repository for only the database structure and code. It’ll also include a NAnt script to automate the setup of the database structure with a modicum of data. I think we’re going to unilaterally declare this new repository to be the master copy of the database. We’re hoping to use the multi-source configuration block in CC.Net to get the database setup and build integrated into all the projects that touch the shared database. Any change to the shared database should result in an automated build of all the dependent parts. Of course, we’ve got the little problem of not knowing all of the little one-off tools that rely on the shared database. We also don’t have any kind of CI or automated testing infrastructure for any of the older projects. One problem at a time is going to have to be the mantra for a while.
Configuration management of databases is not easy, and I doubt that very many shops do it well. I’ve got a slide for database configuration management in our Continuous Integration presentation that basically says “Database CI is really important, but I don’t know the best way to do it — good luck to you.” You simply cannot shirk the database configuration. The database is part of the code and really needs to be versioned *with* the rest of the code. I’ve often seen a drag on project efficiency as we dealt with “false” bugs that were just a result of the database version being out of synch with the middle tier version. That problem goes away fast when the database is updated and rebuilt with the same automated build that produces the middle tier and user interface build products.
It’s easy enough to have every single piece of Data Definition Language code run as part of the CI build, but how best to handle large amounts of test data? You certainly can’t drop and recreate a production database, so how do you deal with making delta scripts to apply new changes to a shared database that is already in production? These aren’t easy questions, and I really don’t think anybody has a definitive answer. Pramod Sadalage has a good paper on evolutionary database techniques, but I think the database is one of the remaining frontiers in Agile development.
Shared Service Yes, Shared Database No
This is a perfect situation to apply Service Oriented Application (not sure about it being a web service yet though). The longer term solution is to get an honest to goodness service layer in front of the database. Any application, applet, or little administrative tool has to go through the service layer. Soon as we establish the common service we can greatly reduce the surface area of the code that is intertwined with the database structure. Add a comprehensive battery of automated FitNesse and unit tests against the new service layer, and we’ll be able to modify the behavior of the system with confidence. Exposing the business workflow that is represented by the shared database will lower the cost of creating new client functionality because we won’t have to recreate the database access code.
I strongly believe that business logic should be mated to data access code. Interpreting the business meaning of data is business logic. SQL “WHERE” clauses often contain meaningful business logic. Consolidating the business logic that interprets this data and the data access code that reads the shared database will allow us to eliminate a lot of duplication. Eliminating duplication makes a system easier and safer to maintain and modify. At a minimum we need to get the administrative tools using the application service layer instead of the end around to the database so we can keep the two pieces synchronized.
How did we get here?
We’re certainly not unique. I’ve seen several IT shops that were effectively held hostage by a shared database. Having zero encapsulation around raw database data makes all of the applications sharing the database very strongly coupled. Changes in one system will ripple into other applications far too frequently. That’s an avoidable mess.
The problem is that application integration has historically been difficult. Many, if not most, applications don’t have a publicly accessible service strategy for easy integration. Almost every enterprise software developer is comfortable writing low level data access code, so the temptation is always there to just go around an application into its underlying database for integration. Opening up a database connection and grabbing or writing data is mechanically easy. Like so many things in life doing the easy thing is very often the wrong thing.
I’m very frequently skeptical of much of the SOA and web service hype, but there is definitely some solid fundamental thought behind SOA.
How are you handling database configuration management? I’d love to hear suggestions.