CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Darrell Norton's Blog [MVP]

Fill in description here...

Third normal form (3NF) is not the end goal

SBC recently posted on .NET Weblogs about designing databases prior to development.  While a good post, one statement got me:

"normalization (at least 3NF)..."

While third-normal form is a general guide in designing a database model, it is not the be-all end-all goal. 

The logical model is supposed to be taken to 3NF and then adjustments made for performance, ease-of-use for developers, etc.  These adjustments back away from 3NF, tending towards 2NF.  For example, say you have an address table that includes the following fields: AddressID, StreetAddress, City, State, ZipCode.  Seems reasonable, right?  Well, according to at least 3NF this is wrong.  You see, I can determine the City and State from the ZipCode.  In fact, as the US post office continues with the zip code extensions (12345-6789-01), they do not even need the street address either (that's the 01 in the zip code).  This means that the City, State, and maybe the StreetAddress are facts about the non-key field ZipCode, a clear violation of 3NF.  This is just an example, and my apologies to anyone living outside the US, but I think you can understand the point I am trying to make.

Who knows about the forms after 3NF?  There is Boyce Codd Normal Form (BCNF), a.k.a. fourth normal form, and fifth normal form (5NF).  BCNF and 5NF are rarely used in real-life.  This is not to say that there is anything wrong with them; the principles of removing redundancy are key to the relational model.  It is just that usually the effort-return tradeoff is not favorable to making the design this strict.  Actually BCNF is a good normal form for developers.  BCNF (and also 5NF) attempt to minimize the number of fields involved in a composite key, ideally to 1 field.  This will make developers happy (see my ranting post), and now we have an academic/theoretical reason to back us up!

So what is the right amount of normalization?  Which form: 1NF, 2NF, 3NF, BCNF, 5NF?  The answer, as always, is it depends.  Sometimes for performance reasons you will want to denormalize to 2NF.  Other times, pushing to BCNF is a good idea.  It depends on the project, the effort-benefit tradeoff in maintaining the data, etc.  As a rule of thumb, though, it is better to logically model to 3NF or BCNF, and then back away where the complexity of the model outweighs its usefulness or excessively inhibits performance.

See William Kent's A Simple Guide to Five Normal Forms in Relational Database Theory.

Note: The title of this post might seem to indicate that I do not believe in 3NF or database normalization.  This is not true!  I am against the blind adherence to categorical statements such as this that pervade software development.  Also, I am not saying that SBC wrote his post with this blind adherence in mind.  It happened to be his post that used the statement that I challenge.  Let me reiterate that I think SBC's post is well-thought-out and has some very good ideas.



Comments

Morten Hest said:

Jeg tror at BCNF står for Brown Computer No Frequenzy. Vh Morten
# May 5, 2004 10:50 PM

Svin said:

hey jaa
# May 5, 2004 10:52 PM

Aziz Shamim said:

His example is unreasonable. You can only normalize to data that you have, not to data that exists somewhere.

If you have the eleven digit zip code, AND you have the relationship of that zip code to a street address, a la the Post Office. AND when you collect addresses, you can collect the eleven digit zip code (which I'm sure everyone reading this has memorized for their street address), then you can claim that the Address, City, State, Zip method of storing data does not meet 3NF.

But his overall point about 3NF not being the goal is correct. I'm really just nitpicking.
# December 1, 2004 4:16 PM

Darrell said:

Aziz - that's fine, I'm always open for discussion! It was an example that seemed to work at the time. :)

The interesting thing to note is that the post office is actually transitioning to the long zip code instead of street name, city, state and zip due to the inherent problems of misspelling, etc. So eventually those dependent fields hopefully will be removed.
# December 2, 2004 2:46 AM
Check out Devlicio.us!