Darrell Norton's Blog [MVP]

Sponsors

The Lounge

News

  • Darrell Norton pic

    MVP logo

    View Darrell Norton's profile on LinkedIn

    Currently Reading:

    weewar.com

Advertisement

Images in this post missing? We recently lost them in a site migration. We're working to restore these as you read this. Should you need an image in an emergency, please contact us at imagehelp@codebetter.com
Third normal form (3NF) is not the end goal

SBC recently posted on .NET Weblogs about designing databases prior to development.  While a good post, one statement got me:

"normalization (at least 3NF)..."

While third-normal form is a general guide in designing a database model, it is not the be-all end-all goal. 

The logical model is supposed to be taken to 3NF and then adjustments made for performance, ease-of-use for developers, etc.  These adjustments back away from 3NF, tending towards 2NF.  For example, say you have an address table that includes the following fields: AddressID, StreetAddress, City, State, ZipCode.  Seems reasonable, right?  Well, according to at least 3NF this is wrong.  You see, I can determine the City and State from the ZipCode.  In fact, as the US post office continues with the zip code extensions (12345-6789-01), they do not even need the street address either (that's the 01 in the zip code).  This means that the City, State, and maybe the StreetAddress are facts about the non-key field ZipCode, a clear violation of 3NF.  This is just an example, and my apologies to anyone living outside the US, but I think you can understand the point I am trying to make.

Who knows about the forms after 3NF?  There is Boyce Codd Normal Form (BCNF), a.k.a. fourth normal form, and fifth normal form (5NF).  BCNF and 5NF are rarely used in real-life.  This is not to say that there is anything wrong with them; the principles of removing redundancy are key to the relational model.  It is just that usually the effort-return tradeoff is not favorable to making the design this strict.  Actually BCNF is a good normal form for developers.  BCNF (and also 5NF) attempt to minimize the number of fields involved in a composite key, ideally to 1 field.  This will make developers happy (see my ranting post), and now we have an academic/theoretical reason to back us up!

So what is the right amount of normalization?  Which form: 1NF, 2NF, 3NF, BCNF, 5NF?  The answer, as always, is it depends.  Sometimes for performance reasons you will want to denormalize to 2NF.  Other times, pushing to BCNF is a good idea.  It depends on the project, the effort-benefit tradeoff in maintaining the data, etc.  As a rule of thumb, though, it is better to logically model to 3NF or BCNF, and then back away where the complexity of the model outweighs its usefulness or excessively inhibits performance.

See William Kent's A Simple Guide to Five Normal Forms in Relational Database Theory.

Note: The title of this post might seem to indicate that I do not believe in 3NF or database normalization.  This is not true!  I am against the blind adherence to categorical statements such as this that pervade software development.  Also, I am not saying that SBC wrote his post with this blind adherence in mind.  It happened to be his post that used the statement that I challenge.  Let me reiterate that I think SBC's post is well-thought-out and has some very good ideas.


Posted Tue, Jun 24 2003 12:46 PM by Darrell Norton

[Advertisement]

Comments

Ulla Nillersen wrote re: Third normal form (3NF) is not the end goal
on Wed, May 5 2004 10:49 PM
ya mann
Morten Hest wrote re: Third normal form (3NF) is not the end goal
on Wed, May 5 2004 10:50 PM
Jeg tror at BCNF står for Brown Computer No Frequenzy. Vh Morten
Svin wrote re: Third normal form (3NF) is not the end goal
on Wed, May 5 2004 10:52 PM
hey jaa
Aziz Shamim wrote re: Third normal form (3NF) is not the end goal
on Wed, Dec 1 2004 4:16 PM
His example is unreasonable. You can only normalize to data that you have, not to data that exists somewhere.

If you have the eleven digit zip code, AND you have the relationship of that zip code to a street address, a la the Post Office. AND when you collect addresses, you can collect the eleven digit zip code (which I'm sure everyone reading this has memorized for their street address), then you can claim that the Address, City, State, Zip method of storing data does not meet 3NF.

But his overall point about 3NF not being the goal is correct. I'm really just nitpicking.
Darrell wrote re: Third normal form (3NF) is not the end goal
on Thu, Dec 2 2004 2:46 AM
Aziz - that's fine, I'm always open for discussion! It was an example that seemed to work at the time. :)

The interesting thing to note is that the post office is actually transitioning to the long zip code instead of street name, city, state and zip due to the inherent problems of misspelling, etc. So eventually those dependent fields hopefully will be removed.