SBC recently posted on .NET Weblogs about designing databases prior to development. While a good post, one statement got me:
"normalization (at least 3NF)..."
While third-normal form is a general guide in designing a database model, it is not the be-all end-all goal.
The logical model is supposed to be taken to 3NF and then adjustments made for performance, ease-of-use for developers, etc. These adjustments back away from 3NF, tending towards 2NF. For example, say you have an address table that includes the following fields: AddressID, StreetAddress, City, State, ZipCode. Seems reasonable, right? Well, according to at least 3NF this is wrong. You see, I can determine the City and State from the ZipCode. In fact, as the US post office continues with the zip code extensions (12345-6789-01), they do not even need the street address either (that's the 01 in the zip code). This means that the City, State, and maybe the StreetAddress are facts about the non-key field ZipCode, a clear violation of 3NF. This is just an example, and my apologies to anyone living outside the US, but I think you can understand the point I am trying to make.
Who knows about the forms after 3NF? There is Boyce Codd Normal Form (BCNF), a.k.a. fourth normal form, and fifth normal form (5NF). BCNF and 5NF are rarely used in real-life. This is not to say that there is anything wrong with them; the principles of removing redundancy are key to the relational model. It is just that usually the effort-return tradeoff is not favorable to making the design this strict. Actually BCNF is a good normal form for developers. BCNF (and also 5NF) attempt to minimize the number of fields involved in a composite key, ideally to 1 field. This will make developers happy (see my ranting post), and now we have an academic/theoretical reason to back us up!
So what is the right amount of normalization? Which form: 1NF, 2NF, 3NF, BCNF, 5NF? The answer, as always, is it depends. Sometimes for performance reasons you will want to denormalize to 2NF. Other times, pushing to BCNF is a good idea. It depends on the project, the effort-benefit tradeoff in maintaining the data, etc. As a rule of thumb, though, it is better to logically model to 3NF or BCNF, and then back away where the complexity of the model outweighs its usefulness or excessively inhibits performance.
See William Kent's A Simple Guide to Five Normal Forms in Relational Database Theory.
Note: The title of this post might seem to indicate that I do not believe in 3NF or database normalization. This is not true! I am against the blind adherence to categorical statements such as this that pervade software development. Also, I am not saying that SBC wrote his post with this blind adherence in mind. It happened to be his post that used the statement that I challenge. Let me reiterate that I think SBC's post is well-thought-out and has some very good ideas.
Posted
Tue, Jun 24 2003 12:46 PM
by
Darrell Norton