CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Eric Wise

Business & .NET

"Transactional" Database Structures

I was meeting with a group of developers the other day and the topic, which I will not go into detail about came up about using a transactional database structure for some data that required a history versus the typical method of keeping current records in one table and on any change writing to a history table.  I've seen some transactional based structures in other projects before and I was slightly surprised that more than half the people in the meeting had never seen one before.  This can mean only one thing: it's blog time!

Disclaimer: While I have a bit of skill with SQL Server, it's not something I usually "teach" on, so if any of this is unclear I apologize in advance.

 

What do you mean by a "Transactional" structure?

In concept, it's very similar to financial accounting principals where you have debits and credits.  You post credits and debits in equal values throughout the accounting system such that as money flows through accounts you end up with a balanced total.  At the end of the day, you have an endpoint where there is a credit or debit with no counter-balance and that ends up representing profit (or loss).  This concept can be similarily applied to transactional history data.

For example, let's imagine an insurance company that issues a Cobra policy to an employee.  For those of you unfamiliar with how Cobra works, basically you make a payment on the first of the month to get insurance coverage for that month.  Now if you don't make your payment, your policy shows as cancelled and if you submit a claim it will be rejected.  However, you have a 30 day period to make your payment, and if at any time in that 30 days you do make the payment then your previous claim can be resubmitted and will be accepted.

What the transactional structure brings us the ability to do is use an effective date such that we can look back in history to see what the state of your account was on a particular date.  This is easier to understand with some hard data: 

We will say you had a policy that started (TransCode ST), on January 1 :

RecordId PolicyId TransCode TransFlag EffectiveDate PolicyDate
1 1000 ST 1 1/1/2006 1/1/2006

First thing you'll notice is that the data is denormalized.  We will use transcode ST for start and SP for stop.  Transflag will always be 1 or -1, which is how we define a valid current record.  If the sum of the trans codes for a particular record is 0, then it is considered to be invalid if you are looking at the "current" view.

Per our example, let's say that the employee does not make a payment on February 1st, and their policy is cancelled.  At this point you will insert a new record into the table to show that their policy has ended:

RecordId PolicyId TransCode TransFlag EffectiveDate PolicyDate
1 1000 ST 1 1/1/2006 1/1/2006
2 1000 SP 1 2/1/2006 2/1/2006

Now we can easily see that if you ask on February 5th if they have coverage the answer is clearly no since they have an active stop date of 2/1/2006.  Here's where the fun comes in, let's say that the employee makes their payment on February 10th.  This means that the insurance company will make their policy retroactively effective so that the claim on the 5th can be valid.  This requires invalidating the stop date that was entered:

RecordId PolicyId TransCode TransFlag EffectiveDate PolicyDate
1 1000 ST 1 1/1/2006 1/1/2006
2 1000 SP 1 2/1/2006 2/1/2006
2 1000 SP -1 2/10/2006 2/1/2006

Now, when we submit the claim on or after the 10th, the effective date kicks in an shows that the sum of the TransFlags for the stop entry is 0, hence is invalid and can be ignored.  So we end up with a start date of 1/1/2006 with no end date which means their policy is currently active.

The neat thing about this is that we have also preserved the history through the use of the effective dates.  If we were to query based on an effective date of 2/5/2006 then the invalidation of the stop date would not have occurred yet and you would be shown the accurate state of this person's account at that specific date and time.

 

What are the advantages of structuring data this way?

  • Simplification
    • There are fewer columns/datapoints/tables than in a more traditional and relational active record + history table setup.
    • By eliminating these extra tables and logs we no longer need to worry about using triggers etc to maintain said logs.
    • We only use inserts.  Existing data is never altered or deleted which puts a great integrity constraint on the data
    • Effective dates allow an easy query interface to view the state of an account at a point in time, stored procedures are structured such that the default view is "today"
  • Flexibiliity
    • Normalized structures in this case will be more performant.
    • Business rules and interfaces can change without changing the data structure (In general, you will write views that perform the transaction flag summations, grouping and filtering)
  • History
    • Pefectly maintained and with an insert-only constraint incorruptable
  • Archiving / DataCleaning
    • This is very easy to do because you have an effective date (it is common since you end up with more datarows than in a traditional structure to periodically archive everything prior to X date, like in accounting when you close the books for the last fiscal year.  When archiving you typically copy the old data into an archive table and then put the aggregate info into a data warehouse type structure for fast and easy historical querying)

What are the disadvantages?

  • Less Readable
    • Structuring for normalization, performance, and data capture gives a base table that is not all that readable or reportable directly.  Reports etc should use prefabricated views or a data warehouse so that the "non-sql elite" can get at information in a good reporting format easily.
    • This also means there is a learning curve to developers who have never seen this structure before.  Many people go crosseyed when first looking at it.  =)
  • Performance
    • Do to aggregates, summations, and denormalization you have to be very careful about how you structure your queries and indexes.  You can really shoot yourself in the foot if you do it wrong.  Luckily fixing it is a matter of performance tuning.

In conclusion, when used in a proper scenario, this technique can give a much more history friendly structure that is reliable and fast to query.  It is certainly not applicable anywhere, and you should be cautious and consult a good DBA before you attempt to implement this type of structure but it really does pay big dividends in the right situations.



Comments

Jason Haley said:

# February 16, 2006 6:16 AM

Jason Haley said:

# February 16, 2006 6:18 AM

Odds and Ends - Zach Nichter said:

I subscribe to Jason Haley's blog and there was an interesting blog reference today on "Transactional" Database Structures. I have to admit I have never worked...
# February 16, 2006 11:09 AM

Eric Wise said:

You'll have to catch up on my original post and Jeremy's informative post before delving into this one....
# May 25, 2006 6:33 PM

Nachiket said:

Thanks,helped a lot for my research paper
# June 19, 2006 2:42 AM

Ayende @ Blog said:

Advanced Uses For OR/M

# January 13, 2007 2:22 PM

transactional data said:

Pingback from  transactional data

# April 26, 2008 7:58 PM

Leave a Comment

(required)  
(optional)
(required)  

Enter the numbers above:
Add
Check out Devlicio.us!