Migratory Compromises

Martin Fowler recently wrote an article about incremental data migration. In it he covers some of the pitfalls to putting off data migration and the benefits associated with tackling migration iteratively. As a lot of us are doing re-writes or replacement systems in this day and age it’s worth a read.

I think there’s an important piece either implicit to the piece or missing. Namely: when we put off migration we might be exposing ourselves to the possibility a large net decrease in the quality and/or functionality of the new system. I’ll explain.

When we’re developing new systems we’re often correcting the errors of our own making or an inherited past. We are hopefully ensuring the integrity of our new data structures whether they’re produced as a side-effect of an object system or domain modeling effort or we’re making some kind of database as a primary project artifact. So can we assume that existing (and scarily dirty) data can be brought over into this new, pristine environment? Clearly the answer is “no.”

Often times we’re working with legacy data from a rat’s nest system that’s evolved over the years. I remember a particular nasty data migration that, postponed to the end of the project, took a good month to do. Not just a month of effort, a month of toil and drudgery!

Indulge me a brief war story. The data in question was from a system that had been through several data migrations and patched/in-place replacements. First Sybase, then Access, then SQL Server 6. There were several tables of questionable value, “day of week” and “gender” immediately springing to mind. One could look at rows as kinds of geological strata. Certain fields became out of date and screens ended up being coded with conditional logic along the lines of “if the record date is less than a certain day, get this semantic value from this field, otherwise…” As if this wasn’t enough there were out-and-out data integrity errors of the particularly egregious nature. There was no way the reports siphoning this data could be correct or counted on. At best they were a relative and probabilistic measure of what was really happening the business.

Naturally I made the mistake of not taking this albatross into account. Nightmare.

What can you do to avoid this situation? As Martin shares, making an initial assessment of the current data structure would be a big first step. If the data is messy, you’d be well served to tackle migration incrementally and early. But what about my (maybe not so) extreme case where data “assets” are in awful shape?

Conscript Users

If you have the luxury of leveraging users to fix data issues, use it. Sometimes these issues can be fixed through the legacy application itself. For example, we introduced a feature in our vendor management module that ensures vendors aren’t duplicated (by their tax ID). In a client’s system there was all kinds of redundant data. We approached them with this issue and worked out a plan of collapsing duplication before counting on migrated data.

Defensive Architecture

Sometimes it’s best to raise your guard against imported data in your application’s design. By taking an early assessment (which, in the case of product development, might be an educated guess) we can decided if old data is trustworthy. If not, we might build our applications in such a way that missing or invalid data is part of the app itself. Taking the vendor de-duplication example, we might have built a feature that let users correct duplication in the new system and just brought the data over as is. Expanding on that feature, we might also have prevented 1099s (tax forms) from being generated for suspected duplicates providing an exception report for these cases.

The problem with designing for bad data is the increased effort — and therefore cost — involved in design, implementation, and test. This strategy, I’d say, should be used as a last resort and sparingly. All disclaimers aside, sometimes it can’t be avoided; we can all probably tell a data horror story or two.

The Real Risk

We’ve been cruising on this new project. We’re happy with the design and it’s an order-of-magnitude better than the previous solution. Our client’s going to be thrilled! That’s a lovely feeling to be sure, but in reality what we’re bringing forward might be a limiting factor in total success. You might have to make disappointing compromises like scrubbing new features or extending a project’s scope/time/budget if you’ve developed a feature that simply isn’t compliant with old data.

A thorough initial assessment paired with incremental migration can help you make Agile decisions about architecture and client involvement. Without techniques like this you’ll essentially be rolling the dice on how long migration takes or, at the worst case, whether new features are practical without a whole slew of compensating or enabling features.

This entry was posted in Agile, databases, migration. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

4 Responses to Migratory Compromises

  1. If you wanna buy some other things which are not listed in our website:
    please contact with our customer service with on-line chat or add our Email,we will stock it for
    you in a short time after you told us what you needed.

  2. Johny Morris says:

    As a data migration consultant of long standing (ten years plus doing nothing but), I’ve got to say that it’s satisfying to see more people coming round to thinking that maybe we should all approach data migration with a little more forthought. Not enough space here to impart more than a passing gem or two – but I always recommend to clients that you run two parallel projects – the main one and a data migration project that feed off one another. And, if say, the development project is 12 months, then the data migration project should start at the same time as the development one and spend the first 6 months doing landscape analysis and data prep on the legacy.

    But for all the latest views and insights on the subject check out datamigrationpro.com.

    Johny Morris

  3. Jeff Tucker says:

    It gets even more fun if you’re going to have to migrate from multiple data sources in different formats, such as supporting migrations from three previous versions of the application in both Oracle and Sql Server (2000 and 2005, of course).  We left this to the very end so a bunch of stuff ended up having to be done manually over the course of the next year.  One strategy that I found helpful in this case was to define an intermediary format of some sort.  Doesn’t really matter what format it is (text file, xml, etc) as long as it’s relatively easy to export data to from all your various database versions and types.  Then you only need a single import script and you can change that script as much as necessary as your database evolves.  That way, you never have to change the export scripts, ever.  If you just transform the data somehow, you’ll have to change your transform for every database that you’re trying to migrate from.  I found this to be moderately successful, however it was still a painful process because it was left until the last minute.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>