How do you extend and customize a database?

Here's the scenario:  your application is frequently customized per customer, and that means the database has to accomodate new types of information.  My question to you is what do you think is the best (least bad) way?  Here are the strategems I've used or seen before:

  • Allow custom fields to be added to the database tables.  Slippery slope to Hell in my experience.  As an internal IT architect, customizing 3rd party code filled me with dread because our upgrade path was shot and support was often harder.  Inside a product company that kind of customization adds maintainability costs, especially if you have to keep customer specific branches.  I have seen some systems accomodate this with custom data dictionaries and metadata driven architectures.  Needless to say, it's complex.
  • Use wildcard fields at the end of tables like "string_field_1" or "int_field_1" that can be used for anything.  I refer to this kind of design as the "Mystery Meat" antipattern because it's never obvious what's in there.
  • Store extended information in name/value extension tables.  Flexible, but often clumsy in execution.  More Mystery Meat.  Drives DBA's nearly insane.  Wait, that might be a plus.
  • Serialize the extended properties as Xml or some other sort of Memento representation in a column.  Presupposes an Object Oriented Domain Model approach.  Requires some customization of the persistence layer, but I think some O/R tools support this (ActiveRecord in RoR does at least).  Doesn't work all that well for reporting needs.  Mystery Meat again.

On a tangential note, is anybody using a true OODB?  Do you think it'll ever be feasible/practical/common?  Maybe the XML hybrid databases will end up being better.

Now that Smalltalk and its variants seem to be making a bit of a comeback, I wonder it that ends up making OODB's practical if they can just use continuations to take quick snapshots of an object.  All the OODB's I've ever read about were Smalltalk.

About Jeremy Miller

Jeremy is the Chief Software Architect at Dovetail Software, the coolest ISV in Austin. Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy is the author of the open source StructureMap tool for Dependency Injection with .Net, StoryTeller for supercharged acceptance testing in .Net, and one of the principal developers behind FubuMVC. Jeremy's thoughts on all things software can be found at The Shade Tree Developer at http://codebetter.com/jeremymiller.
This entry was posted in Database and Persistence. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://www.agilekiwi.com John Rusk

    Regarding the XML option in SQL server, it also has the advantage that your app can validate the contents against an XML schema before saving the XML packet into the field. That lets you do basic validation, all through the schema.

  • Josh robb

    There’s some new guidance from the MSDN Architecture section related to this in the SaaS (software as a service) section.

    Multi tenant data architectures – http://msdn2.microsoft.com/en-us/library/aa479086.aspx

    HTH,

    j.

  • ammonite54

    As BigJim suggests there really is no silver bullet for this. It just depends on the circumstances and the way in which customer data extensions are to be used. I believe in the KISS principle and have in the past built schemas in which there are secondary tables housing key./value pairs suitably indexed to make sure that they are unique – in your case unique to a customer for instance. You can then as Big Jim says create views which join this xtension table to the base table and allow users to query for all they are worth on that view. This is really quite a flexible and practical approach, especially for those customers out there with spreadsheet skills and minimal SQL knowledge.

  • http://www.lukemelia.com/devblog/ Luke Melia

    We have one app that uses a variation of the name/value pairs approach, and it has worked out well, but only because the extension scenarios are very tightly defined.

    With regard to object databases: I think they could be a good fit for a lot of apps. I saw a demo of db4o at SD Expo in the fall and it looked pretty serviceable.

  • Andrew Norris

    Extension (“sister”) tables suck a lot less than most of the alternatives. We use a lot of cu_[client]_[tablename] tables for this stuff.

    Mystery meat can work up to a point if you carry around separate metadata that identifies these fields (with additional complications if the fields are used different ways in different cases). If you do that, your biggest problem is that it doesn’t scale very far.

    Custom fields (with a naming standard) can work if you’re willing to structure all your data access code (and all your schema upgrades) around the fact that other fields may be inserted in unexpected places. We haven’t done this, but we’ve kicked it around.

    You will regret name/value pairs or XML fields beyond reckoning if you ever need to query or report against them. And you will. I promise. Just say no to these.

  • http://www.developernotes.com Nick Parker

    Ugg…this post brings back horrible memories of customizing GoldMine by FrontRange Solutions.

  • Don Cameron

    Very timely subject for us. We just went through this type of evaluation and for our purposes the sister table with the name/value pair is where we are heading.

    Jeremy your comment “Slippery slope to Hell” is very applicable, as users will want to start adding new data types, validations rules, business rules – almost to the point where I throw my hands up in the air and give them a copy of VS2005 and say “do it yourself”.

    I am curious what are you all doing for:
    - different data types?
    - simple validation rules
    - UI presentation
    - ad hoc reportin

    of this extended field data

  • BigJimInDC

    IMHO, the core decision point revolves around the answers to a couple of questions about the extended fields (XF’s). Like all aspects of software design, there is no silver bullet that addresses all situations.

    #1 – Is the XF going to be queried at a RDBMS level? 99% of the time, the answer is yes, but it has also been no in my experience, and thus can impact your decisions.
    #2 – Will business logic rely on, be impacted by, etc this new XF?
    #3 – Is the XF going to be displayed on custom and/or ad-hoc reports? Typically if the customer takes the time to enter the data, they probably want it in hard-copy some time down the road.

    Personally, I like the key/value pair solution, and I sell it as “tagging”, which nearly everyone gets, whether they’ve used it or not. #1 above is satisfied, and you can generally arbitrarily limit the size of the data being stored if you have performance concerns. You can also add simple data typing to it to handle simple things like proper sort ordering of numbers and dates, while storing everything as strings in the DB. Over the years, I’ve set this up in an “RDBMS loose coupling style”, thus allowing me to “tag” literally any data element in the system in a centralized manner. I’ve also added full-text memo capabilities in a nearly identical manner, both of which have addressed all of the client needs that I’ve been presented, regardless of the type of system being designed and built. You can also have your DBA build views that can easily be joined into queries that denormalize the key/value pairs to allow more straight-forward customer querying and reporting.

    If you’re answer to #2 is yes, the solution will be more involved, but if you know it is yes up front, you’re also probably missing some larger requirements for the system, and should spend some more time diving into that.

    I’m not a big fan of letting the customers arbitrarily add fields to the core tables, and I find that using a limited set of fields to address a potentially unlimited set of XF’s to be too short-sighted (as in the wild-card field approach). The sister table approach keeps the customer’s fingers off of the core system tables and leverages the RDBMS’s power more than the key/value pair solution. But as I’ve described above, you can accomplish a nearly identical result using key/value pairs, without needing to allow DBA level access and/or expertise in the RDBMS.

  • sergiopereira

    I did not know they were called “sister tables” but that’s what I have worked with before. The same system also used a codegen logic that would prefix all the fields that came from the sister tables with something like “Ext” and make them all available in a collection like “ExtendedProperties[ ]“. There were also views that pre-joined the most important tables with their “evil twins”.
    I don’t know if there’s an elegant solution to this problem, but this one at least seamed manageable (and automation-friendly).

  • http://prerit.com/ Prerit

    What I’ve seen/used before is an approach similar to the sister table method but the core tables included the extension columns. They were generally prefixed (i.e. Ext_) to differeiantate them from the core columns. I find that this is a bit better than the sister table method because it didn’t increase the number of joins.

  • http://www.codebetter.com/blogs/eric.wise ewise

    For customizing, I find the XML field approach in SQL Server to be ideal. It puts almost no limitations on the additional data they wish to attach as well as being able to be queried in t-sql.

    One company I did some consulting work for even went so far as to store form schemas in xml fields such that it would be added to the end of an existing form. So they’d specify Second Phonetextbox etc. It was kind of nifty.

  • Chris

    What about a sister table approach? For every core table there could potentially be a sister table where extended properties would reside. This way a core schema would not be touched and a customer’s customizations would be compartmentalized. This would also remove the mystery meat and possibly ease reporting on comstom?

    Downsides…
    -Customization of persistence layer.
    -Addition of “sister table” DB management code for creation and alteration of said sister tables.
    -Performance, referential integrity, DBA concerns of associated core/sister records?