Composite keys are evil

The title says it all. 

There's a section in Eric Evans' DDD book where he spends about 3-5 pages saying that composite keys are bad and surrogate keys are good (without just coming out and saying that).  At the time I thought he was being exceptionally wordy, but it's an awfully good point to drive home.  Composite keys (multi column primary keys) make any kind of Object/Relational mapping and persistance in general harder.  Life is so much easier with surrogate keys.  You can always make unique constraints where it's necessary. 

When I've used NHibernate against a legacy database, we just cut our losses and added autonumber columns with a unique constraint to act as the identifier for NHibernate and left the old key alone.  It worked pretty well, but I don't think those classes had any relationships, so I don't know how that would work.  I'd like to recommend to my current client to quietly ditch their homegrown Java persistence tool in favor of Hibernate, but putting an O/R mapper on a legacy database is challenging.  iBatis maybe, but I don't know much about it yet.


From anybody older than me out there, why were Composite keys so widely used in older databases?  Was there a real reason?  All I've ever seen from composite keys is pain. 

About Jeremy Miller

Jeremy is the Chief Software Architect at Dovetail Software, the coolest ISV in Austin. Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy is the author of the open source StructureMap tool for Dependency Injection with .Net, StoryTeller for supercharged acceptance testing in .Net, and one of the principal developers behind FubuMVC. Jeremy's thoughts on all things software can be found at The Shade Tree Developer at
This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • mafaza Faradisa
  • Forum Terpopuler

    Jual Obat Aborsi Cara menggugurkan kandungan usia 1 bulan, 2bulan, 3 bulan, 4 bulan, 5 bulan, 6 bulan, 7 bulan secara mudah tanpa resiko maupun efek samping dengan menggunakan misoprostol 200cmg berjenis cytotec dan gastrul ( obat aborsi ). untuk menggugurkan kandungan dengan cepat dan aman selain itu banyak jenis obat aborsi yang dapat ditemukan di apotik – apotik terdekat.

    Kami menjual obat aborsi serta memberikan solusi bagi anda yang ingin menggurkan kandungan dengan aman dengan cara pemakaian dipandu / dibimbing secara langsung oleh dokter spesialis kandungan ketika paketan obat sudah datang dan di tangan customer, Aborsidilakukan oleh wanita hamil baik yang sudah menikah maupun belum menikah langkah sebelum melakukan aborsi harus mengetahui usia kehamilan, jika anda tidak tahu atau kurang yakin berapa usia kandungan

    obat aborsi di surabaya

    obat aborsi di malang
    <a href=" obat aborsi ampuh
    obat aborsi tangerang work.bepress
    obat aborsi bandung work.bepress
    obat aborsi makassar work.bepress
    obat aborsi asli
    obat aborsi jogja work.bepress
    obat aborsi surabaya work.bepress
    obat aborsi bekasi work.bepress
    obat aborsi asli makassar
    obat aborsi aman tanpa efek samping
    obat aborsi ampuh banget
    obat aborsi aceh
    obat aborsi asli bandung
    obat aborsi bandung
    obat aborsi banda aceh
    obat aborsi balige
    obat aborsi banjarbaru
    obat aborsi balikpapan work.bepress
    obat aborsi banjarnegara
    obat aborsi best plan
    obat aborsi bandung 2016
    obat aborsi di jogja
    obat aborsi di solo
    obat aborsi di medan
    obat aborsi di denpasar
    obat aborsi di bali
    obat aborsi di manado
    obat penggugur kandungan
    obat penggugur kandungan di surabaya
    obat penggugur kandungan di malang

    obat penggugur kandungan di jogja
    obat penggugur kandungan ampuh
    obat penggugur kandungan area jogja
    obat penggugur kandungan area malang
    obat penggugur kandungan asli
    obat penggugur kandungan area solo
    obat penggugur kandungan balikpapan
    obat penggugur kandungan bandung
    obat penggugur kandungan batam
    obat penggugur kandungan banjarmasin
    obat penggugur kandungan bekasi
    obat penggugur kandungan bogor
    obat penggugur kandungan bali
    obat penggugur kandungan di bali
    obat penggugur kandungan di makassar

  • Nisya Sabila
  • Anime Butt-Whole

    Why is a preference towards ease-of-use versus performance not a tradeoff worth considering.

  • L.K.

    Your example is flawed. As soon as a “Sandwich” becomes part of a “Meal” we start having to carry over sandwich ids (composite or surrogate) and as soon as a “Meal” becomes part of any other table we have to further push the previous key.

    Your example select and statement “KILL my query” are also bad examples. If bread had more information than just [Name] you’d need to join onto it anyway. Int comparisons are faster than string comparison. Before you think “but I meant indexes” read ahead.

    Take your example and invert the idea. Bread now contains [id, name, calories, carbs, protein]. If users want to lookup bread by its name, then yes having name be the key instead of id makes sense. When users start wanting to search by more and more things indexes may become necessary. E.g., user wants to find all breads with calorie count below 80 / slice. This would more than likely result in table scans instead of seeks because our clustered index (name or id depending on how you want to view) didn’t state to keep things in order related to calories. The non-clustered index would point to the reference in the clustered index on the disk.

    Yes, more indexes does hurt insert performance, but not that much. Especially on a table named Bread or Meat or Sandwich, where searching will more than likely occur much more often than inserting new items.

  • Nuno

    Composite Key are not evil. Most of the times surrogate keys are evil (not all times) and by using them you are sacrificing the application in favor of “simplicity” and “laziness”.
    As Damooter says composite keys are good practices. By using a “natural” key the rows will be ordered by that natural key in disk, which means that queries made by that natural key will be much more efficient. Surrogate keys will only be used when querying for associated data (when navigating from foreign key to the principal). On the other hand natural keys will be useful when querying data according to the application logic and when making joins.
    When using surrogate keys, even if you create indexes by the natural key, queries using the natural key will not be as efficient. I encourage you to make some tests with this and see it for your self: with an existing table on an application you are working on analyze the query plan (and especially the number of reads) of a query of your application that uses the “natural” way to filter information. Then drop the Clusterd Index (Primary Key) and create a new Clustered index with the natural key (possibly a composed key, in that case the order of the fields in the index will be very important) and then analyze the new query plan.
    If you use an ORM that does not support composite keys or it makes it hard to work with composite keys then change your ORM and use an ORM that supports it.
    This kind of talk reminds me of the “myth” that normalization hurts performance because it increases the number of joins.

  • Cochise Ruhulessin

    Software developer incompetence is no reason to make tradeoffs in the design of a database.

  • Cochise Ruhulessin

    The only real reason for using surrogate keys is when the natural keys (often composite) are too wide.

  • Cochise Ruhulessin

    “why were Composite keys so widely used in older databases? Was there a real reason?”

    Because the base principle of relational theory is the “information principle”, meaning that a database stores information, and surrogate keys are non-information.

  • Stuart

    One of the reasons natural keys were more popular in the old days was to avoid the cost of a join. Instead, the natural key was typically a ‘human readable’ code which would mean something to a user without the need to do another fetch – this could save on IO and development time (joins in databases like PICK weren’t quite as trivial as they are in modern RDBMS’s).

  • Paul

    I’d just like to mention that the case above is a perfectly good usage of composite keys, but there are many cases where composite keys can cause huge headaches, particularly in UI development. It’s best to know when to use them and when to not use them, as engineers we need to be able to analyze that, one approach is not best for all situations.

  • Paul K

    really? never looked at it that way.

  • WayneDB

    Engineering is about trade-offs though. Databases need applications to use them or they would be useless. So, we need to make trade-offs in order that both DB and App can work well.

  • aaaa

    Thank you for sharing your stuff on blog. It is doubtless we have hurt Similar Interests. Something are very helpful to me…. Rocco Basile
    Rocco Basile

  • DangerMike

    This post was pretty high up on the Google results for NHibernate composite keys, so I feel like I should respond in case anyone else reads this and thinks it should be taken to heart.

    To be perfectly clear: In a purely associative table defining a many-to-many relationship, composite keys are the correct implementation.  Ephemeral surrogate keys are wasteful and damaging to performance.

    Consider the following silly tables:
    Breads ( id int not null primarykey, name varchar(20) not null )
    Meats ( id int not null primary key, name varchar(20) not null )

    I want to store unique combinations of breads and meats to make sandwiches.  There are two ways to do this: I can have a compound primary key or I can create a surrogate primary key:

    Sandwiches ( 
      MeatId int not null, 
      BreadId int not null, 
      constraint pk_sandwiches primary key ( MeatId, BreadId), 
      constraint fk_sandwiches_meats foreign key (MeatId) references Meats (id),
      constraint fk_sandwiches_breads foreign key (BreadId) references Breads (id)

    This will allow us to find all the salami sandwiches quickly, but the rye sandwiches will require a scan unless we add:

    CREATE INDEX ix_sandwiches_breadid ON Sandwiches (MeatId)

    If I want to do this with an ephemeral surrogate key, it will look like this:

    Sandwiches_SK (   SandwichId int not null primary key,  MeatId int not null,   BreadId int not null,   constraint fk_sandwiches_sk_meats foreign key (MeatId) references Meats (id),  constraint fk_sandwiches_sk_breads foreign key (BreadId) references Breads (id))

    I am going to need a non-clustered index to find the sandwiches by meat:
    CREATE INDEX ix_sandwiches_sk_meatid ON Sandwiches (MeatId)

    And another to find sandwiches by bread:
    CREATE INDEX ix_sandwiches_sk_breadid ON Sandwiches (BreadId) 

    BUT WAIT!  We don’t have any guarantees of uniqueness on my sandwiches.  So I am going to make one of those indices cover both meat and bread and be unique:
    CREATE UNIQUE INDEX ux_sandwiches_sk_meatid_breadid ON Sandwiches (MeatId, BreadId) 

    That looks a lot like the clustered index we were going to have already.  So we have an additional index we have to maintain, which is going to hurt insert performance.  And it takes up more space.  Let’s ignore page sizes and pretend the rows are stored continuously.  In other words, this is an underestimate, but it’s close.

    Meat: 48 bytes/row    clustered: 4 bytes + 20 bytes = 24 bytes/row    name ix: 20 bytes + 4 bytes = 24 bytes/rowBread: 48 bytes/row    clustered: 4 bytes + 20 bytes = 24 bytes/row    name ix: 20 bytes + 4 bytes = 24 bytes/rowSandwiches: 20 bytes/row
        clustered: 4 bytes + 4 bytes = 8 bytes/row
        breadid ix: 4 bytes + 4 bytes 4 bytes = 12 bytes/row
            don’t forget that non-clustered indices include the full PK, even if that’s redundant.

    Sandwiches_SK: 24 bytes/row
        clustered: 4 bytes/row
        meat/bread ix: 4 bytes + 4 bytes + 4 bytes = 12 bytes/row
        bread ix: 4 bytes + 4 bytes = 8 bytes/row

    So we are losing 4 bytes/sandwich that we create.  

    BUT WAIT!  It gets much worse.  If I want to know the names of all the meats available on rye:

    SELECT m.Name
        Meats m JOIN
        Sandwiches_SK s ON s.MeatId = JOIN
        Breads b ON = s.BreadId
    WHERE b.Name = ‘Rye’

    If we have a lot of sandwiches with rye I am going to have a lot of bookmark look-ups to get from my SandwichId (which is what I get after resolving through ix_sandwiches_sk_breadid).  That is going to KILL my query.  So I have to decide, based on data density, if I am better off including MeatId in ix_sandwiches_sk_breadid (costing me another 4 bytes/row) or if I should just suffer through the bookmark look-up.

    Using a composite primary key makes this problem literally goes away.  You don’t have to think about it.  As engineers, we should take normalization as our default position and deviate from that only after careful analysis.  Surrogate keys are a denormalization and should be viewed as evil in most cases.  

    I haven’t read Eric Evans’s book, but I’m pretty sure that I’ll trust Ted Codd’s advice first.  At least until Eric Evans’s Turing Award comes through.

  • Cooper Wu

    i take this idea to simplify my models in MVC. 
    use composite key(3 columns), i really don’t know how to write models.

  • keylogger

    wonderful, thank you


    hmm nice i have not seen any post like this grate.

  • Sweet_red_rose1

    I return to time and again without even noticing. The reason this is so has to be because of the insightful and entertaining content..interstate removalists Brisbane

  • Everydoor11

    I really enjoyed this post, especially the “examples in this post” portion which made it really easy for me

  • Damooter

    Composite keys are used for performance and to maintain data integrity within your chosen RDBMS (that supports composite keys). Surrogate keys add overhead within the database as they bear no relation to the data you are storing and are merely stored for the convenience of the ORM layer being used. In an OLTP application this can lead to serious performance issues. As a SQL Server DBA I have spent many hours trying to tune nHibernate generated queries with little success. A lot of the issues arise from the lack of support for composite keys or a lack of understanding of why composite keys are used and the fact that once the ORM layer is in place it is very difficult to make changes to the schema without breaking the application using it. To say that composite keys are evil simply because they cause an inconvenience to developers is not a very convincing argument but sadly one I hear more and more often. Composite keys are not a symptom of legacy databases, they are good practice.

  • Anonymous
  • access password recovery

    Thanks for the very nice article!

  • Kresimir Bojcic

    Did you find any good reasons now after 4 years? I tend to switch camps. It comes down to protecting your data. I’ve noticed that in big enterprise systems composite keys get to save me in some ways. I am always thinking that if the system was better I would not need that kind of protection but still sometime you can only achieve it with composite keys. I have one example here

  • DVD Duplicator

    Thanks for sharing this very nice article.

  • Justin

    Also, surrogate keys have no performance penalty in the average case. A unique id has a lot more uniqueness per bit than all but a few natural keys. You lose space on each primary record, but you win it back on every foreign key.

    I’ve been programming for over 20 years, and I’ve been working with RDMS’s for more than 10 years. Old-school SQL is dead; ORM is, and has been for many years, the future. I can also write assembly for a half dozen processors, but they only make 2 of those these days. Skill/knowledge faces obsolescence in a hurry these days.

    This field sucks because you can’t just settle down — you have to always be learning — but that’s the same reason I like it. It helps that I’m in my 20’s, though, and I can imagine getting bitter in a decade or so.

  • Justin

    Patrick: “3- querying was easier to follow in code and allowed for slicing (where only a subset of the attributes in a composite key was used in a query)”

    I disagree on the importance of “easier to follow” queries, as the purpose of the query is obvious from the higher-level view of the application logic. The actual SQL generated is a little like the machine code a compiler generates — it’s only worth considering when the O/R mapper does something stupid and creates a bottleneck.

    However, the option to slice foreign key queries is the first decent argument for natural keys I’ve ever heard. Of course, even for the instances that is useful, it’s still easier to use a surrogate key for routine queries and then just duplicate the relevant portion of the natural key in foreign references.

  • schlenk

    The simple fact is, ORM and relational databases are a bad fit and composite keys, which are a quite natural fit for RDBMS show the weakness of ORMs.

    All those ORM systems would probably be happier if the Object style databases would actually work and have adequat performance instead of bolting OO style onto relational databases.

  • Patrick Wright

    When I started programming professionally, it was 1993, and the client/server architecture was entering the mainstream in a big way; a lot of our work (designing custom business applications) revolved around designing database models. At that time, surrogate keys were not a popular idea, from what I recall. A large part of our modeling revolved around finding what attribute(s) uniquely identified a concept of interest in the business operation. Those were the keys. You could have one, or several attributes in a key, and actually several keys per entity (the primary, and sometime natural and alternate key-sets).

    In retrospect, there was no problem programming in this model, but we ourselves weren’t trying to use an O/R mapping tool. There were several nice side-effects of modeling using keys (composite or not):
    1- led to better understanding of the entities; if you couldn’t decide on a key, it was often a sign there was duplication hidden in the entity, calling for normalization
    2- keys migrate to their children, following strict rules, which meant you could pick up any database model and understand something about any entity you chose just by following the migrated keys
    3- querying was easier to follow in code and allowed for slicing (where only a subset of the attributes in a composite key was used in a query)

    Surrogate (“blind”) keys started to be in vogue a few years later. The discussion as I recall it related to problems of very deep migrated keys, where the primary key of some descendant table would be enormous, leading to large index structures. There were techniques to get around it, but surrogate keys made it somewhat easier to build smaller indexes. On the other hand, generating unique surrogate keys on high-volume OLTP systems led to its own problems.

    Point being–it’s not so long ago that surrogate keys were the exception, rather than the rule. I actually just picked up my copy of Thomas Bruce’s “Designing Quality Databases with IDEF1X Information Models”, which, if you ever come across it, is a great read–and explains why composite keys make sense if you understand them in the culture in which they thrived.


  • Jeffrey Palermo

    In a topic like this, there is not ‘always’ and ‘never’ answer. For instance, my company’s product does have composite keys in the database, but they tend not to be in the tables that make up the aggregate roots of my domain model. In my case (and I am using NHibernate), creating an entity with a dual-key would complicate not only my domain model but also the mapping file. Yes, I’m considering the whole of my application in this decision and not the database schema in isolation. I must say that this is completely orthogonal to the decision to use a natural or surrogate key for the entity. Using the data’s natural key for the entity usually works just fine, but I find it easier to work with a surrogate if the natural key requires two pieces of data.

    Other entities might have tables with a composite key, but it’s a database detail, and my domain model still has a single key, and these entities are not the aggregate roots. For instance, for a Book aggregate root, I might have many Chapter entities that wholly belong to a Book instance. In this case, since I’ll want to maintain order, I’ll use a composite primary key for Chapter that consists of the chapter natural key and a display index that is used to store the order of the chapter in the Book’s Chapters collection. This case of a composite key is easy to deal with because NHibernate’s mapping takes care of persisting the order part of the composite key. What is hard to deal with is a composite natural key for an aggregate root.

    Regardless, a deep understanding of SQL and the relational model has to be in play.

    On another note, I, too, share the frustration that some folks create database tables with no actual natural key (a guid is not a natural key).

  • Chris Holmes

    “In the logical model, a primary key should be defined against durable attributes of the entity.”

    I question the use of the word “durable” here. I could be misinterpreting this, but it seems to infer the wrong criteria for determining candidate keys. “Durability” seems to infer the key shouldn’t change, and that’s not a criteria for determining candidate keys. Nowhere in the literature of relational calculus, relational algebra or normal forms will you find discussion on the “durability” of primary keys.

    Primary keys should be chosen based on qualities that provide uniqueness and nothing more.

    “To address the question of changing key values – this is not a problem because of shortcomings of the RDBMS tool”

    Sure it is. Ever try to change a primary key in a RDBMS that doesn’t support Cascade Update? You can’t, because every FK reference breaks.

  • Chris Teixeira


    I believe I introduced the Student/Class example as an associative entity. And I agree, it is easily handled, just as it is, with a composite primary key. As far as your extension of the data model to handle test scores – the Class in the model should be identified by Course-Section-Semester (Or Trimester, or Period, or whatever). Then it is not a problem for the student to repeat a class. Oops! Another composite key! Now the test scores table has Course-Section-Semester-Student-Test as a primary key! It may seem awkward, but it’s exactly the way the relational model works, based on functional dependency. | Course,Section,Semester,Student,Test -> Score | Plus, when you query the test scores table, look Ma! No joins to get at most of the interesting information.

    I think that the reluctance to use composite keys in a data model is not unlike the fear of adding classes to an object model: it leads to “simplexity”.

    To address the question of changing key values – this is not a problem because of shortcomings of the RDBMS tool: In the logical model, a primary key should be defined against durable attributes of the entity. If there are no durable candidates, an artificial key should be used. Most databases need to use artificial keys in at least some tables – consider a Sales table; there is usually not a natural key available in this case.

    I agree with Jacob; really it is the ORM framework that needs to be able to map relational models to object models: that is it’s job. The forehead-bashing usually comes up when you encounter the ORM tool’s shortcomings.

  • Jacob Eggleston

    It seems to me that the problems people experience with persistence are more a fault of the O/R mapper than the type of PKs used in the tables. A join is a join, no matter how many key fields are involved. A good O/R framework *should* handle composite keys in exactly the same way as surrogate keys. I’ve really never understood why some frameworks treat them differently.

    So I think it’s not so much that “composite keys are evil” as it is a case of there still being some room for improvement on the tools side.

  • Chris Holmes

    @ Ayende Rahien,

    I don’t want to nitpick, but your example is flawed. The tuples you’ve listed aren’t enough to make a row in test_results unique. You’d need to include the date the test was taken in that table to make a row truly unique.

    If you’re modeling your entities correctly, then it shouldn’t matter if you use a composite key or a surrogate key. If it matters, then something in your design is wrong, and you don’t really have a “surrogate” key.

    “Frankly, I would like to know about why NOT use surrogate keys.”

    If a db developer is using surrogate keys correctly, then there is no problem. All they’ve done is take a natural key and created a surrogate for it, which can make many things easier to do in development.

    The problem comes when developers haven’t been exposed to the relational model and then they start thinking the surrogate key *is* the primary key. Then they design systems using MUCK tables (massively unified code-key tables) and other such database “anti”-patterns. That’s when you run into problems.

  • Chris Holmes

    “I’m much more concerned about composite keys from the perspective of writing persistence code, not so much performance. Composite keys make persistence much harder.”

    And that’s the real issue; I agree with you Jeremy. I come from the E.F.Codd/Relational Calculus school of DB thought. But at the same time, real work has to get done, and as a developer/architect anything I can do to make the code base easier to work with, then that’s the road I’m going to travel.

    That said, I’d like to know in more detail what has caused you the biggest pains when working with natural keys. Is it when key values change? Because that’s an implementation issue on the part of the RDBMS, from my point of view. Or is it something else?

    Where I work, we use LLBLGenPro as our O/R mapper, so all our tables have surrogate keys. I haven’t tried to use it on a test database written using natural keys yet, but that’s on my To-Do list. But the question I have is, what makes surrogate keys easier in terms of persistance? I find the O/R mapper is what makes my life easier, not the keys. Half the time I wish I wasn’t using surrogate keys, because any query on an actual value requires one or more table joins before I can use it in a “where” clause.

    Just curious what your major headaches were with natural keys. I don’t dispute you have them, but I’m curious – more information would probably really shine a light on in my head :-)

  • Ayende Rahien

    You example of Student / Class is merely an assoication table.
    There is usually no problem with those.
    Now let us turn Enrollment into an entity, add the date the user registered for the class, link a forth table, for the grades of the tests, and what you have now is a more complex model, you now have:
    /* by the same logic, we can only take one type of test in an enrollment */
    create table test_results(
    student_id int,
    class_id int,
    test_id int,
    grade int

    Now, if a student fails a class, and he wants to take it again, how are you going to handle that. Suddenly the change is not just in a single table, but in two.

    For myself, I believe that I should avoid repeating myself, even in the DB. I find that it is easier with surrogate keys than with composite keys.
    I usually create a set of views that give me the data in a human readable form, but the model behind it is using surrogate keys.
    Frankly, I would like to know about why NOT use surrogate keys.

  • Ayende Rahien

    Because often enough, the people that build the DB hasn’t been educated with RDBMS practicalities long enough.
    It is a lot more cumbersome to use composite keys than surrogate keys, both from the SQL point of view and from the user point of view.

    I have found the misappropriate choice of keys can really ruin your week. Specifically, I would really like to be abolish DateTime as a PK / Composite Key for ever.

    At any rate, NHibernate does have good support for legacy databases, I am using it on a 30 years data model that was originally developed for a mainframe. To say that it is horrible would be high praise.
    It took some work, but it is working well now.

  • Chris Teixeira


    I would like to start by saying that I enjoy your blog immensely. Please keep up the great writing.

    With regard to composite keys – a few things to ponder:

    Most people that work with databases are familiar with the idea of an “associative entity”, sometimes called a “relationship table” or “linking table”.

    For example, consider Students, Classes, and Enrollments. Each Student may be enrolled in some number of Classes, and each Class may have some number of Students, but no Student is enrolled in the same Class twice. To model this relationally, we create an Enrollments entity. This Enrollments entity has a foreign key that references the primary key of Classes, and a foreign key that references the primary key of Students. These two attributes together make up the primary key of Enrollments, ensuring that the same Student may not be enrolled in a Class more than once; i.e. the Enrollment is unique.

    This is not a ‘legacy’ approach; this is the way the relational model works, and is the correct or ‘normal’ way to model this concept in a relational database. The primary key models the functional dependency that | Class,Student -> Enrollment |. This can be read that “a class and a student determines an enrollment”, or “an enrollment is functionally dependent on a class and a student”. This is not unlike a function that takes a Class and a Student as parameters, and returns an Enrollment as a result.

    So this is not about space being at a premium, conversion of manual processes, performance, or some historic difficulty in creating unique scalar values. When a relational database is designed properly, composite keys are likely to be part of the landscape; this is still the correct way to do it.

    An illustrative factiod: of the 5 original “normal forms”, 3 out of 5 only apply in situations dealing with composite keys (2NF, 4NF and 5NF).

    This is why we are stuck with what has been called the “object-relational impedance mismatch”, and it is true that it is a pain in the neck to deal with these varying key structures, and most ORM tools cannot do it.

    The best compromise is probably to keep the proper composite primary keys in place, and add a surrogate single-column key with a unique constraint on it.


  • jdn

    I’ve been working on a project where the vendor DB doesn’t use surrogate keys, and there’s a good reason for it. The data has a natural key that crosses columns.

    Of course, a data architecture committee wanted us to add surrogate keys to every table. ‘Easier joins’ was one reason.

    Well, when we want to get the data we want, we will be querying the natural key columns anyway, so whether they are part of the join clause or part of the where clause is rather irrelevant. The surrogate key wouldn’t do anything.

    Having said that, every database I’ve ever created has had an identity column as a primary key, so what do I know?

  • jmiller

    Older programmers are missing the boat? Is that what your saying? I can write SQL and ADO.Net all day long, but I’ve paid my dues and I’d like to move onto producing functionality now;)

  • Mr_SQL

    Older programmers didn’t mind slinging a little SQL instead of getting hung up on whiz bang O/R mappers.

    Newer programmers aren’t sure what SQL is.

  • sergiopereira

    I can’t say I like composite keys. I’ve done my share of elbow-busting UI and persistence code to figure it doesn’t make our lifes any easier.
    On the other hand, there’s one side-effect of using composite keys that I like: when you open a table that has FK’s to a few other tables you don’t need to stare in disgust at all those numeric entries without being able to tell what they mean, you don’t need to write a query with several joins to “see” what the data represents. I guess that’s what views are for.

  • jmiller

    I’m much more concerned about composite keys from the perspective of writing persistence code, not so much performance. Composite keys make persistence much harder. Jeffrey’s comment pretty well hits it on the head for me.


  • Sam Smoot

    Bravo Chris!

    I’d like to add that lots of people take on a preference for natural-keys as a holy crusade pointing to extra indexes and that extra column as a performance problem.

    The truth is that my current employer uses MSSQL2K with partitioned tables to store *billions* of rows with little effort, and no real DBA’s to speak of (not that that’s a good thing, just saying it doesn’t take a lot of skill to pull it off). These rows are also pretty wide, with… let me count here… 45 columns in one example. Many of these columns are VARCHAR(2) for what should be CHAR(2), or VARCHAR(1) [y,n] for what should be BIT fields, etc.

    RDBMS engines, CPUs, RAM, it’s all amazingly fast these days, so the overhead of composite keys just isn’t very pragmatic I think. Sure there’s an exception for every rule, but if you’re using composites for most of your work, more than anything it’s probably just a bad habit that’s burning more money than you really need to _most of the time_.

    You still need to make sure you’re writing efficient queries of course, and that you’ve got halfway decent covering indexes.

  • Chris Holmes

    Composite keys were used because of E.F. Codd, Relational Calculus and Relational Algebra. The problem was that most RDMS’s of the time weren’t mature enough to be able to handle changing key data.

    Keys “changing” isn’t supposed to be an issue. If you study the relational model keys are expected to change. Not one rule in relational modeling says “the key should never change”. That’s not why you choose a key. The whole point of a key is to uniquely identify a tuple in a table. Pick the right set of attributes to uniqely idenfify the entity and you’re golden.

    The real issue has always been the implementations. There’s a disconnect between the theory and the products that vendors provide. RDMS’s early on did not handle keys changing well, and I think that’s why this mindset developed that “keys shouldn’t change!” The fact is, all natural keys can change and nothing should prevent them from doing so.

    However, early on Cascade Update wasn’t around to make natural keys work well. But, developers tried their best to stick to the theory because it is sound. The hope being that someday vendors would give developers the RDMS’s to actually implement the relational model correctly, and make changing natural keys a reality.

    So, bottom line: it’s one of those issues where the theory is really sound, but the software available isn’t often good enough to make it work. So people move to surrogate keys to get things done, because making things work is how we all stay employed.

    The problem now is you get developers who actually think the surrogate key *is* the key, and they don’t understand what makes a tuple unique, or why or how you choose keys.

  • Damien Guard

    Traditionally space was at a premium and people believed that some parts the data could be used as a key. Today space isn’t a concern and we’ve all been hit by a scenario whereby the “data that never changes so can be part of the key” does just that.

    Composite keys are however still useful for scenarios where your tables contain logical sets of data that might be moved around – e.g. an invoicing system might have a primary key of companyno+invoiceid for invoices so that at a later date you could merge two physical databases into one without clashing on the invoiceids.

    As to why you might want to merge them – report writing and analysis tools don’t tend to like merging multiple db’s into a single report and reporting is a big part of many scenarios.


  • Jon Stonecash

    Back in the day (and I am old enough to say that with some conviction), the data systems were manual or just recently converted from manual processing. Creating an unique key was not nearly as simple as it is now. It was much easier to construct a key from the composite elements that were at hand. Some practices take a LONG TIME to die. There are those who think that such traditions are a good thing. I am not one of them.


  • jpalermo

    I think humans should use natural keys, and computers should use machine-generated surrogate keys.

  • Adam Vandenberg

    I don’t know!
    But there is a school of thought with a large following that database keys should be “natural” keys, as in, parts of the actual data, and not synthetic autonumber/identity columns.

    Peresonally, I have no idea why one would want a database with people keyed on “Name+SSN”, which isn’t even unique enough anyway.

    Though when doing ER, in the relation tables, I’ve been guilty of making composite keys out of the foreign primary keys, and not adding in an identity column. That was in the past, though.