Don’t Let the Database Dictate Your Object Model

Before you go on, I'm specifically worried at the moment about "Logic Intensive Systems" here.  Systems that perform complex calculations, make optimizations, determinations, decisions, etc. 

Many, if not most, enterprise applications have both an object model and the database model.  For the most part, convergent evolution will probably lead the two models to be very similar, but it's potentially dangerous to constrain the two models to match perfectly because the two models reflect different concerns altogether.

  • Database Model - When you design a database model you're primarily worried about the best way to structure data for efficient storage and retrieval, while also enforcing data integrity rules
  • Object Model – The object model is first and foremost concerned with modelling the behavior and business logic of the system.

Ideally, I'd like to work on these two models somewhat independently and allow both models to reflect their different concerns first, and each other second.

O/R mapping of all flavors isn't that difficult to use (authoring an O/R mapper is a totally different story) when the database and the object model are very similar.  The problem is that making the persistence easier by locking the object model to the database model can make writing and consuming the business logic harder.  I'm working with a system that uses business objects that are basically codegen'd one to one from a legacy codebase with 400+ tables.  The temptation and driver for codegen'ing the business objects is obvious (400+ tables). 

The problem I'm seeing though is that consuming these business objects in the service layer is that the business objects do not really reflect the behavior of the business logic.  Even worse in my mind is the fact that there is no encapsulation of the raw database structure from the service layer.  If we had designed the business classes to reflect the behavioral needs to make writing the business logic easier (and testable), we would have ended up with a quite different structure.  Just to throw up some examples:

  • Big tables don't map to a single object.  I don't think it's possible that a class with a 100 different properties can possibly be cohesive.  We'd be much better off in terms of writing business logic if that 100 column table is modelled in the middle tier by a half dozen classes, each with a cohesive responsibility.  It may make perfect sense to have only one table for the entire object hierarchy, but big classes are almost always a bad thing.
  • Data Clump and Primitive Obsession code smells.  A database row is naturally flat.  I want to do a bigger post on this later, but think about a database table(s) with lot's of something_currency/something_amount combinations.  There's a separate object for Money wanting to come out.  If you make your business objects pure representations of the database you could easily end up with a large amount of duplicate logic around currency and quantity conversions.
  • Natural cases for polymorphism in your object model.  I think the roughest part of O/R mapping is handling polymorphism inside the database.  Check out Fowler's patterns on database mappings for inheritance.

Back to the 400+ table problem.  Yes, their database model is huge, but they aren't actually consuming most of the generated classes anyway.  Looking at the bigger picture, I think it would have probably been easier to deal with the dissonance between business logic needs and the existing database structure in the database mapping, even though that potentially represents more work to do persistence,  instead of exposing the raw database structure to the business domain classes and their consumers.  In this case, I think making the business logic easier to use and consume would more than offset the extra persistence cost.  Since I would expect the business logic to change more often than the database structure, I would also prefer to optimize my ability to modify the business logic in isolation from the database.

Besides, you definitely want to minimize coupling to a legacy database on the off chance that you might get to fix it up or move away from it later. 

 

WARNING — If you are going to use O/R mapping of any kind, it's even more important than ever to properly enforce referential integrity rules in your database.  I don't know if it's just my bad luck or what, but the legacy databases I've hit in the past couple years were all missing a lot of logical referential checks –and unexpected problems with orphan records quite logically ensued.

Object Relational Mapping is Hard at the Edges

If you haven't already read it, take a look at Ted Neward's seminal paper on the O/R quagmire.  I thought he was exagerrating the problem on my first read, but now I'm not so sure.  Automated, metadata driven O/R mapping (and I'm broadly including the LINQ varieties and codegen tools here too) gets really nasty at edge cases.  There comes a point when the metadata driven approach starts to hurt more than it helps and it's probably better to revert to hand-rolled database mapper code in these cases.  This issue is part of what drove me to write Being afraid of your backhand.  One way or another, you will occasionally need the ability to allow the object model to diverge from the database model.

Not to put words into Neward's mouth, but reverse engineering your business domain classes from an existing database definitely fits his analogy comparing O/R mapping to a quagmire.  This especially holds true for a legacy database that isn't, shall we say, pristine in structure.

Heck, the last time I willingly wrote a stored procedure was to take advantage of a PL/SQL feature to pull a logical hierarchy of data out of a flat database table.  It worked beautifully thank you (of course the rest of the team threw a fit about using a sproc). 

The Role of the Database

When people think about the role of a database in an enterprise system, I think there's a spectrum of thought with two polar extremes.  I think the proper place in the spectrum should vary by application, but we all come in with presuppositions on the best way to write software based on our prior experences that impact the direction of our design.  Where you sit on this spectrum has a lot to do with how you will approach application architecture vis a vis the database:

  1. The database is paramount, and the system is expressed and understood in terms of the tables and rows in the database.  The application code and even user interface is just a conduit to get information back and forth into the database.  You design the database first and then build the business and data layers to match the database.  In the .Net world we might just consume raw DataSet's in the application, effectively just working with the database tables offline.
  2. The behavior of the system, primarily in the middle tier and user interface, is paramount, and the database is "just" a means to persist the state of the system.  The database is either built to match the business classes or designed somewhat independently.

Reporting applications and simpler data entry applications can happily sit at the #1 data centric end of the spectrum.  I think any system with significant business logic really needs to be edging over to the second end of the spectrum.  The tricky part is recognizing when an application crosses the line from purely data centric to logic centric.  I'm of the opinion that applying data centric development approaches to logic intensive systems leads to a world of trouble.

About Jeremy Miller

Jeremy is the Chief Software Architect at Dovetail Software, the coolest ISV in Austin. Jeremy began his IT career writing "Shadow IT" applications to automate his engineering documentation, then wandered into software development because it looked like more fun. Jeremy is the author of the open source StructureMap tool for Dependency Injection with .Net, StoryTeller for supercharged acceptance testing in .Net, and one of the principal developers behind FubuMVC. Jeremy's thoughts on all things software can be found at The Shade Tree Developer at http://codebetter.com/jeremymiller.
This entry was posted in Database and Persistence, Legacy Code. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://www.dotnettricks.com/ Fregas

    Mike,

    Jeremy is specifically referring to “business logic intensive” systems. For most simple applications, a 1:1 ratio between tables and classes works fine. In fact, i would go further to say that it works fine for many moderately complex applications. Where it doesn’t work well is:

    - Legacy databases that are not well-normalized and/or have weird schemas. I have a real life example for a health related application. There are 4 tables that all hold diabetes information for patients. The only difference is there is different data in each table based upon a flag. Its a bad design and all should be in one table with an extra column to flag which type of diabetes patient it is. I don’t really want 4 DiabetesInfo classes. Of course in this case, most O/R Mappers won’t do you much good either, except maybe as a wrapper for data access.
    - Applications that have a lot of business complexity. The example i like to use is a Address class. You may have address related fields in your database in an Orders table, an Address table (with a CustomerID relating back to Customers) or an Employees table all in the same database. However, you don’t want address logic in 3 places if you’re doing more than simple CRUD stuff. What if there is business logic to compare the distance between two addresses based on City, State and Zip. This may need to be rolled into its own class.

    There are a number of ways to get around this:
    -Duplicate (copy & paste) the business logic from one mapped class to another. (BLEH!)
    -Don’t use the O/R Mapper for those specific tables and instead roll your own business objects and data access code.
    -Create “Helper” classes (AddressHelper, DiabetesInfoHelper) in a very procedural fashion that have static methods which grab all the related data from the OR Mapped classes and performs the business logic and return the results.
    -Create another domain object that the O/R Mapper classes use or refer to that handles the business logic. (NHibernate does something similar to this out of the box with a feature called components.)
    -Create another domain object that wraps the O/R Mapped classes where necessary.

    I’m sure there are more ways, but what i’ve noticed is that almost all the solutions cause your class/object model to be different than the actual tables, IF you don’t want to copy and paste.

    I don’t want to copy and paste.

  • http://www.entityspaces.net Mike Griffin

    I never have bought into this “impedence mismatch” stuff. While you guys are engaged in philosophical debate our customers will be knocking out projects with the very techniques you say don’t work ….

    This sounds like a hit piece by Microsoft, ORM is okay, kind of, but our way is “the way” …

  • http://montpetit.net Claude

    This post is right in line with a decision I must make. On my current project, the data model started bottom up (the DB schema first — the domain expert is a DB guy). The persistence layer is built with the DAO/entities pattern, with JPA and hibenate. We are building the domain model on top, following DDD patterns.

    We defined entities with JPA annotations that map to the DB schema. This database will contain a lot of data: it stores electrical meter data from million homes with usage snapshots at every 5 minutes. Therefore, the DB schema is optimized for that, which results in annotated entities that are not always best suited for the domain model where we need to implement a lot of complicated behaviors and data analysis code.

    I am tempted to define ddd domain entities that are different from the persistence JPA entities. This has many implementation consequences: we need to convert between JPA entities and DDD entities back and forth, we can’t use the JPQL from the client side, we can’t take advatage of many of the ORM facilities.

    As a bonus, some of these DDD modules may be used by the legacy application (the current project is a rewrite of this legacy app)

    For these reason, this post comforts me in the idea of having a clear separation between my domain and the persistence layer.

    Still not sure thouth… not an easy choice. This has a clear impact on the long run and will be difficult to refactor.

  • http://www.jasonbunting.com Jason Bunting

    Jeremy,

    It has been years since the last time I used it (embarrased to say that), but NIAM/ORM (object role modeling) is something you should look into. Frans made some good points that you may have missed because they were dependent on understanding the intent of using an NIAM/ORM approach to designing and application. Do you know much about it? I rarely run into people that do. I worked, for a very brief time, with Dr. Terry Halpin, who formalized ORM and continues to move things forward on it, and I had to learn quite a bit about it (mostly forgotten, unfortunately). When you use ORM to modelling data, you do so by capturing business rules as they relate to the data. From that you can generate a fully-normalized relational data model that correctly represents the business domain’s concerns as they relate to the data. Of course, that is the caveat – you had to *correctly* utilize ORM to model the concerns of the business as they relate to the data in order to get a correct relational data model out of it.

    Anyway, I am probably repeating myself and rambling a bit – but unless you understand NIAM/ORM pretty well, the point Frans is trying to make may not be fully appreciated by anyone reading it. Go look into it at http://www.orm.net – better yet, get the books that Dr. Halpin has published on the subject – the guy knows what he is doing and you might change your mind about a few things. :P

  • RogerBlake

    < >
    What is Object Model anyway?

  • http://survic.blogspot.com/ survic

    This blog demos why strongly typed collections and databases, CSLA, and ADM (amazing! and the eNglish is much better, of course ;-)

    http://www.avocadosoftware.com/csblogs/dredge/archive/2007/01/28/681.aspx

    http://www.avocadosoftware.com/csblogs/dredge/archive/2007/02/19/687.aspx

  • http://survic.blogspot.com/ survic

    OODB – It is so interesting: The order I experienced is Poet (an OODB) by C++ and then Java, then, Poet’s OR mapping, then, other OR mapping tools. The history is that in modern C++ age, we have OODB hype; then, during Java age, we moved to more realistic OR; I really hope .Net can revive OODB, so that we can be proud of something new.

  • http://www.buunguyen.net/blog Buu Nguyen

    So far, I have not seen any ORM tool which allows a transparent and simple mapping as well as enables the object model and data model to evolve independently – each tool just sucks in one goal or another (and tools like EJB 2.x suck in both). Given that, whatever spectrum that you go with (database-first, or domain model-first), there can always be a point in which you will find it hard to evolve any of the model without having to do a lot of work in the other model and the mapping layer itself.

    So unless you have to go with relational database (for reasons like legacy data, same data for multiple applications etc.), you can consider a native object database, e.g. db4o, as a way to go – in that case, you only need to care about one single model: the object model.

    I blogged about these topics here:
    http://www.buunguyen.net/blog/the-legend-of-data-persistence-part-1.html

  • http://swcses.eponym.com/ Joe

    So after catching up on all of this (took me way too long to read) I don’t think my mind was changed. There are a million ways to skin a cat. Regardless how you do it, don’t let your database design dictate your object model design and don’t let your object model design dictate your database design. However, quite naturally the two will look a lot alike more often than different.

  • http://codebetter.com/blogs/jeremy.miller jmiller

    Hey, keep it nice. Besides, we’re talking about code here, not something life and death like Star Wars vs. Star Trek.

  • survic

    Noone really understands half of what Survic is saying. His English is so bad and his thoughts so incomplete that you all should just ignore him instead of responding.

  • Ivan

    @constrain the db design

    only a consultant would say something so profoundly shortsighted.

  • http://survic.blogspot.com/ survic

    the souls of behaviors are data

  • http://survic.blogspot.com/ survic

    By the way, I studied very seriously about CSLA, DDD, Analysis Pattern, and Refactoring. I recommend them highly. “Crap” can be good crap.

    Now, let’s be back to the topic.

    My point is that “database is deep; so, it should be the first”, here is a quote from DDD (Crunching Knowledge > Deep Models):

    “This deeper view of the shipping business did not lead to the removal of the Itinerary object, but the model changed profoundly. Our view of shipping changed from moving containers from place to place, to transferring responsibility for cargo from entity to entity. Features for handling these transfers of responsibility were no longer awkwardly attached to loading operations, but were supported by a model that came out of an understanding of the significant relationship between those operations and those responsibilities.”

    If you start from DB, this is the first step. We all know that, to model a business process, the first thing is to collection the paper work, and immediately imagine they are paper-database (normalized or not, we do not care for now).

  • http://www.dotnettricks.com/ Fregas

    Survic,

    Rather than cluttering up Jeremy’s blog, i’ve posted a more detailed response to you here:

    http://dotnettricks.com/blogs/craigbowesblog/archive/2007/03/03/255.aspx

    Thanks,
    Craig

  • http://survic.blogspot.com/ survic

    Do not get me wrong about DDD. I fully understand its political value — I am pushing very hard to make sure management knows it — I guess everybody is doing it, and that explains its popularity.

    All I am saying is that, to developers, nothing new (but nothing wrong either, except some “details”, such as STC).

  • http://survic.blogspot.com/ survic

    Fregas: I read your posts again.

    You said: “For .NET 2.0, i can’t remember a time where i really need my own, custom, strongly typed collection. It doesn’t mean it will never happen, but so far, I just use generics.”

    – we agree here.

    You also said: “”Ah ha!” I thought. SectionList is probably nothing more than a strongly typed collection, with some custom business rules added. And this led me to think of another obvious one: shopping carts! “

    – And this led me to think of yet another even more obvious one ……..

    Do you see the problem? DDD is crap. You have to agree with me, because you said, you did not use “strongly typed collection” in .net20. If DDD is right, you should have used that evil thing all over the places!

    I bet a coke that after one more year, if somehow you are required to use .net 1.1 (why, do not ask me, in M$ world, time can go back easily ;-), you will use casting, instead of the STC. Casting is ugly, but it does not mess up your thinking.

  • http://survic.blogspot.com/ survic

    Sorry for taking off the topic.
    However, strongly typed collection is indeed one of the major obstacles for OR mapping, and, if you use OR mapping, even just conceptually (because of huge amount of de-normalized legacy databases), you will “naturally” and merrily believe that tables and entities are “the same thing”.

    so, we are not really off, yet ;-)

  • http://survic.blogspot.com/ survic

    You may call me a snob, however, because of my continuous VB background (from VB3, then, a lot of VB6, some VB.Net), and because I always have a soft spot toward it because of its lightweight “get things done” attitude (RAD attitude, or, agile attitude), to be fair to myself ;-), it would be a stretch to say that.

    Having said that, when I use .Net, even in VB.net (I try very hard avoiding it), I follow exactly what I do in Java. That is the source that I do not like strongly typed collection so much, because you do not have that kind of thing in java (just like you do not have dataset in java!), and you are used to it. The style is actually just like generics, paying some price on performance and type safety.

    Then, some people pushed using strongly typed collection systematically, and braging it as “best practice”, that basically made me think and pushed me to the edges, that was when the “do not like it” became “dislike it”.

    And I justified my “dislikeness” by creating a theory, saying that modern OO does not use the concept of “custom strongly typed collections”. If you need some logic, create a non-collection class that contains a general collection (list or dictionary). — I am using a cynical tone to keep myself honest here, because perehaps it is all biassed; however, I really believe it is actually true: in modern C++ and Java (I am not sure smalltalk, but I suspect it is the case also), the concept of “custom strongly typed collections” is simply dropped.

    So, if you use strongly typed collections, you are messing up people’s brain, do not do that! It is not about how easy it is, it is about thinking! A typed colleciton simply throw people off.

    By the way, I observed that most people here treat Martin Fowler and DDD as unambiguous “source of truth”, so to speak. I do not share that at all. I love Martin’s “Analysis Patterns” (still, I always believe it should be read together with David C. Hay’s book). However, I put his other stuff and DDD in the category of “antique OO”. Surprise? Not at all. By “antique OO”, again call me a snob, I basically mean that a person have no serious experience in modern C++ AND (not or!) Java, or, did not learn very hard from modern C++ AND (not or) Java.

    After being in Java and reading Java book, I simply cannot see any value in Martin’s recent work, and the DDD. There are just too much renaming and confusion. Trying to be “neutral” between M$ and Java, or, trying to “invent” something by renaming things, it is simply not right; also, in enterprise computing, M$ has nothing to offer (except RAD, as I said, we should be brave to take it into enterprise computing), why even bother, call “dataset” as … — I resent it the renaming scam to such a degree, I do not even remember it – it is totally ridiculous. I say it again, ridiculous – dataset does not become better or justified by being given a “pattern” name, give me break!!! To me, the whole thing is simply a strategy to get M$’s contract, that is all about it.

    My source of modern OO? Experiences in EJB, and books on J2EE. One of the best among them is “ejb design patterns”.

    With this disclosure, you know what I am going to say about SectionList ;-)

    Another example; however, before I say it, I want to point out that I recommend loudly CSLA to everybody, anybody. However, I use it by tearing it apart, use its good parts, and show its terrible parts as anti-patterns.

    Now back to strongly typed collections. If you like them, take a look of CSLA, then, you will know the real nature of strongly typed collection: pure evil.

  • http://www.dotnettricks.com/ Fregas

    All, forgive me for getting off topic.

    Survic,

    I was not even thinking about collections at all, when i came across this post on the DDD site:

    http://domaindrivendesign.org/articles/archive/gordon_bruce_2003_10.html

    In this article, Bruce Gordon talks about his use of domain driven design for a university’s student registration system:

    “By using a common set of objects, across the tiers (except the host), all layers could reuse the objects, the common vocabulary, and the behavior and constraints of the objects. (For example, instead of having simply a collection of Sections, a SectionList could prohibit duplicates, or enforce other application rules).”

    “Ah ha!” I thought. SectionList is probably nothing more than a strongly typed collection, with some custom business rules added. And this led me to think of another obvious one: shopping carts! One could (should) make a shoppingcart class that is merely a strongly typed collection holding cart objects (again, you can inherit from collectionbase, or use generics internally) that avoids having duplicate line items. When a product is added to the cart that has already been added, it should just update the quantity.

    What do you think?

  • http://www.dotnettricks.com/ Fregas

    Survic,

    I don’t see how strongly typed collections offend you so much. In most cases they were a means to get around the lack of generic support in .NET 1.1. However, in a couple of weird cases, it may STILL make sense to make a custom, strongly typed collection. Perhaps you need an OrderLine collection that will not accept more than 100 lines. You could create a custom collection (strongly typed) that throws an exception or triggers a validation rule or something when the count goes over 100. I realize this is a weird, rare case, but it could happen. There are other ways to accomplish this, but making your own collection (that inherits from CollectionBase so its very little custom code) just isn’t very hard. I don’t see how that constitutes “PURE EVIL.”

    Additionally, I wrote that blog article when i was still doing .NET 1.1. For .NET 2.0, i can’t remember a time where i really need my own, custom, strongly typed collection. It doesn’t mean it will never happen, but so far, I just use generics. But if i had to do a .NET 1.1 project, i might use codesmith or something to generate my strongly typed collections for me.

    -Craig/Fregas

  • http://udidahan.weblogs.us Udi Dahan

    I think that there are enough issues on the table without dragging in SOA, but that’s just me.

    Also, there are times when implementing complex business rules using OO techniques on the Domain Model works well. O/R mappers help with the persistence. Some OO practices (like polymorphism) are better supported than others (Decorator pattern).

    Performance often requires a denormalization of both the Domain Model and the tables it maps to. For instance, in order to support a business rule stating that “if a customer’s orders total more than $1000 do…” it might make sense to keep that field persistent on the customer class, rather than loading all the orders.

    I think that the “Entity” view of the world that Frans et al are pushing comes from the Coad Method (correct me if I’m wrong) which focuses on data. The Domain Model pattern (by Fowler) and the following practice of Domain-Driven Design focus more on behavior. Both have their place. It appears that O/R mappers are used in both cases – thus it makes sense to frame an opinion in the appropriate context.

    At the start of this post, Jeremy tries to frame a certain context: “Logic Intensive Systems”. I think that he might have intended to specify “using the Domain Model pattern” – again, correct me if I’m wrong. I do quite a bit of work on these kinds of systems using these patterns and find that I iterate back and forth – the db doesn’t dictate the object model, and the object model doesn’t dicate the db. In order to create a performant solution, both sides must be well understood.

    If I offended anybody in any way by these comments, please be assured that it was not my intent.

  • http://survic.blogspot.com/ survic

    By “Strongly typed collections”, I mean pre-generic ones. Strangely, they are carried over in .Net 20, even when we have generics (i.e., you use both generics and custom collections). People can argue whatever, I will always see it as a clear sign that ‘this guy is still in VB6’!

    The test stone is that, when you were in .Net 1.1, did you use “strongly typed collections” or not. If you did not, then, we know you are a hardcore of modern OO; if yes, you are somehow still in VB6. Consider being warned ;-)

    As for old databases, I certainly agree that in that situation, OR mapping sometimes is more trouble than solution. Actually that is a rule number one of OR: the technology you choose must be compatible with manual DAO, and you should feel free to it anytime! A lot of times, the sql queries are used freely. Who need another semi-completed query language!

    Then, we have to concede to ADM-only. You can do whatever in DAO, as long as you have ADM. Really, OR or not, it does not matter, the real key is ADM. It is not strange: ADM is the core of SOA, anyway.

    Strangely, I find myself totally agree the “parallel evolution” concept — as long as they are not “separated”. The implementations are certainly in parallel, however, conceptually they are one.

    Because of ADM, down with the “behavior” thing. I keep fighting this thing for so long, and just recently I have realized its nature: antique OO! In modern OO, “behavior” must immediately be split into two: validation or “real” behavior (service-like behavior). Validation is with the ADM, and it does not hurt anything. The “real” behavior is not with the ADM, and rightly so – the antique OO is very misleading here.

    Vega: I like you say “sprocs/triggers do not count”, for me, the reason is that they are just evil. Once you got the idea of OR and application server (it started from “entity bean”, which itself is evil, ironic!), you really begin to treat DB just as DB.

  • vega

    I believe that both the object (domain) model and the relational (db) model should evolve independently using their own methodologies best tailored to their individual purposes. Really they would be working off of the same base *conceptual* business model. The DB will start off with the conceptual but once it goes through the logical and then the physical design model it will eventually lose some of the conceptual pieces that the object model tends to address. Your final “entities” and relationships for the db might not look the same when you first started off as they might change to optimize performance.

    Why separate models? For the exact same classical OO “best practice” reasons: loose coupling, separation of concerns, yada-yada, etc. O/R mappers will always play an essential if not critical role to bridge the two models. I think the problem Miller is discussing is how mappers are sometimes used (or misused) in lieu of a real domain model that can fully handle/solve the business’ process/requirements.

    Also, E/R (concept shared by both models) is only part of the equation. The other half of it is what the object model is suppose to handle and where the relational one falls short which is *behavior*. Sure you can enforce business rules via constraints (PKs, FKs, Nulls, check, default, etcs) on the db side of things but it can not cover all the possible scenarios and logic that most business processes need to operate (at least not in a very manageable fashion; sprocs/triggers do not count because they are not part of the relational model- if any thing they violate the model). DBs should really just concern itself with storing/retrieval of data which is what it does best by using its own model.

  • http://devauthority.com/blogs/devprime/default.aspx DevPrime

    Survic, you have some good points, and I agree completely with getting away from what I call impeded OO models (mostly due to deficient OO capabilities in certain languages, i.e. VB pre .NET for example). However, with regards to some of your other arguments, the problem is that while what you suggest works well for new projects, there are a lot of databases out there that were not designed to be used with ORM technologies, and it’s difficult (sometimes impossible) to use certain ORM products on these. These companies have been running for years on that data and aren’t willing to so easily migrate. As a developer, there’s nothing you can do about that situation except deal with it.

    Secondly, what’s wrong with type-safe collections? After all, generic collections produce type-safe collections (either at compile time with Java or run-time with C#/VB.NET). A List is a Customer-type-safe list. You cannot add an OrderLineItem to it.

  • http://Lotofplaces,DatabaseDesignisdonebyDBAwhodon’thaveOOA&Dskills,simplybecause,theydon’thaveto.TheirjobismakesureDataisstoredandretrievedveryefficiently.Henceth Vikas Kerni

    Lot of places, Database Design is done by DBA who don’t have OOA & D skills , simply because, they don’t have to. Their job is make sure Data is stored and retrieved very efficiently. Hence they make compromises as I mentioned in my previous post.

    It is interesting to note that Data Centric Design assumes that database should be designed to support application. In my opinion, data is too strategically important for enterprise. Applications may be retired by new technology or replaced by third party vendor’s tools, applications. Data is going to stay forever.

  • http://survic.blogspot.com/ survic

    I read Fregas’s blog, I noticed:

    >>>>I prefer strongly typed collections (or possibly generics in .NET 2.0) so my Content object would have a HistoryItems property which is Read Only and of type ContentHistoryCollection.

    ————–
    My goodness! It is another example of antique OO directly from the time-froze machine VB6. “Strongly typed collections” are pure evil, period.

    Of course, use generics – but it is pure, bare bone generics, no strongly typed collections at all.

    Further, if you use .Net 1.1, then, use casting. Type safe and performance? Give me break, in Java, people have been doing that for many years, for serious enterprise applications – again, modern OO, not antique pseudo OO from VB6.

  • http://survic.blogspot.com/ survic

    “Anemic Domain Model”(ADM, or, Anemic Domain Objects, i.e. ADO ;-), is not the result of last century’s antique OO. It is the result of modern distributed computing – it started with J2EE entity bean. The key concept of “entity bean” has been carried over by “ADO” (ok, ADM), and then, SOA. As a result, it is ridiculous to say that ADM is a bad thing because it does not comply with antique OO! (Or, because it is a terrible name, but who cares names!).

    Really, ADM is a hard but good compromise in modern distributed computing. Do not stay in the stoneage’s OO, enter modern OO, which by definition, inherits from modern C++ and Java. You cannot get modern OO by “strengthening” antique OO; it is a gestalt shift.

  • http://survic.blogspot.com/ survic

    I know the following can be perceived as offensive, especially for long time experts in a different tradition. However it is the price to be on a platform that is undertaking a huge quantum jump.

    1. OR mapping is the way to go. Java did it, M$ now finally makes up its mind to follow suit. So, do not even try to be different, just follow it.
    2. This means, the so called “anemic domain model” is actually the way to go. J2EE entity bean started it, OR mapping makes it ubiquitous in Java world (come on, in this regard, there is no difference between entity bean and OR mapping). Again, do not even try other way around, just follow it.
    3. This also means stored procedures are gone forever, of course, you can use them in some weird cases, but as a “layer”, it is gone forever. Again, do not even try to complain, just follow it.
    4. Because of all the above, the only reason to emphasize the “first-ness” of “entity classes” over database schema is political – to shut up people who cannot understand “entity classes”, and, it is a good political technique. However, truth being told, it is only a political tool, nothing else.
    5. As for OO, it is odd that so few people realize that the core of OO is indeed ER. I am glad that Frans knows it. I guess the reason for that is that to understand it, you really need first understand that inheritance and design patterns are easy and not that important. Once you really get that (it takes years, of course – sorry for being offensive, but truth must be told ;-), then, you will really appreciate that the real core of OO is indeed ER, or, normalization. To speed up the process, read Martin Fowler’s analysis patterns, together with David C. Hay ‘s Data Model Patterns: Conventions of Thought.

  • http://devauthority.com/blogs/devprime/default.aspx DevPrime

    Good point Fregas. Actually, the whole concept of encapsulation (one of the pillars of OOP) is that an object encapsulates both the behaviors and the data associated with the behaviors (a slight divergence from your sentence, but the important thing is that it’s more than just a set of data, as you say) :-)

    Entity is really a conceptual thing. Your database can physically implement an entity, but a class can too. You can have tables and classes that both represent entities. The difference we’re all talking about here is how they are physically laid out and how they can map to each other given those differences. And yes, it can get pretty complex as we all seem to saying, especially if your database wasn’t implemented to deal with ORMs from the start.

  • http://www.dotnettricks.com/ Fregas

    DevPrime,

    One of the things i read in a book called “Object Thinking” is that objects/classes are defined by BEHAVIOR not DATA (nor where or how that data is stored.) This principle has served me well over and over.

    In Frans’ world, everything is modeled on the data, the relational data structure, and your classes HAVE TO follow suit, which really sacrifices a lot in your object model. I believe that Frans is probably correct that the original meaning of the word “Entity” was from the RDBMS world, but language inevitable evolves (for good or ill) and that doesn’t mean classes should be the same as database entities/tables, just because some people seem to think or talk as if they are always the same. In many cases they can be, (in simple applications especially) but complicated business logic often dictates that they should differ if you want to avoid code duplication etc, as you and Jeremy have already pointed out. A good ORM product or a good ORM strategy if your hand-coding it all yourself, should allow for these differences.

  • http://devauthority.com/blogs/devprime/default.aspx DevPrime

    Frans, are you always this caustic? Is it not possible to have a reasonable discussion between various developers without resorting to “you have no clue whatsoever”? Sorry, but that’s just a lame stance to take.

    I have no idea where the analysis you make comes from, because I am *not* saying that O/RM is impossible. In fact, the point of my post was to say that it *is* possible, but that most O/RM tools in existence don’t do a very good job and make too many bad assumptions about mappings.

    And yes, there *is* a big rift between entity classes and entity database objects, not because the data they define is all that different (I made no such claim), but because one does not normalize or optimize entity classes in the same manner as database entities. Of course it’s possible to map both types of entities, that was never in question. The problem is the flexibility and amount of work required with existing mapping tools. Now please, settle down before you pop an artery.

  • http://www.dotnettricks.com/ Fregas

    Jeremy,

    I think you’re wasting your time with Frans. His “O/R Mapper” and his whole thought process is that the database is the center of the universe, and that the business classes are merely a by-product of that. He is too arrogant and defensive to consider any other thought process as being worthy.

    I totally agree with you that a good O/R Mapper should let you define business objects separately from the database tables. On simple stuff it doesn’t matter, but like you said, when you get to a system that has lots of business logic, or in your case a (legacy?) database that has over 500 tables, you’ll end up with lots of duplication.

    Think how many tables have address information duplicated in them. If you simply gen the business classes from the tables, you’ll end up with many classes with duplicate address information (and likely duplicate business logic.) At the same time, if you start with the domain layer and make “Address” a separate class, then you may not want to generate a separate address database table. There may be some relationships where the information should be in its own table and others where it should be part of a parent table such as customer, because its a one-to-one relationship. Instead, having a single address CLASS separate from various TABLES that store address information, is the way to go in complex applications.

    You article is similar to one i wrote a while back:
    http://dotnettricks.com/blogs/craigbowesblog/archive/2005/09/05/33.aspx

    Great article Jeremy. Keep up the good work.

  • http://codebetter.com/blogs/jeremy.miller jmiller

    Arnon,

    I’d prefer to eat any necessary mismatch betweeen object and data models in persistence mapping rather than sacrifice either model. I really didn’t mean to sound that down on O/R mapping. All I meant to say is that when you do hit an edge case that your O/R mapping tool doesn’t handle, I don’t think it’s that big a deal to hand roll a hard coded database mapper for the exceptions.

  • http://www.rgoarchitects.com/blog Arnon Rotem-Gal-Oz

    I don’t think this dilemma is different from any of the other design dilemmas – the answer is still it depends.

    I would tend to agree that in most system that aren’t just data entry or reporting you shouldn’t let the database dictate the object model – but many if not most systems are a mix of both data centric tasks and logic centric tasks – so you either mix approaches or you use one and suffer where it isn’t appropriate.

    Regarding Ted Neward’s post- I don’t agree with him. It was true once but O/R Mapping is quite mature now. Yes, it isn’t a panacea but it works well where it fits.
    You can also see a paper I published awhile ago on O/R mapping (http://www.rgoarchitects.com/Files/ormappin.pdf)

  • http://survic.blogspot.com/ survic

    By “politically”, I mean, in theory, using “database” scheme as the “entity model” is more friendly to VB6ers, however, they cannot get it anyway, why bother. It actually gives their opportunities to mess things up. “Entity” first, it means a total new start, it is a harsh “shut up, listen, and learn” style. From what I have observed, the latter is a better way.

    This means, even you are using database scheme as your baseline or first-cut, you will not advertise it. Throw some inheritance on it, mix the entity part with other layers (DAO, façade, command, singleton etc), so that the whole things look like an OO model instead of data model.

    So, technically, I agree with Fran (I even say that the core of OO is actually relational. Come one, what is the foundation of OO, nothing! The core is ER!), politically, I agree with Jeremy. Let’s take over the world from VB6ers by using OO, that is, if those VB6ers refuse to be assimilated to be part of us ;-).

  • http://survic.blogspot.com/ survic

    vikas (http://vikasnetdev.blogspot.com/) mentioned this discussion to me. Obviously, I am on Frans’s side ;-) Take a look of this site: http://www.geocities.com/tablizer/whypr.htm I do not agree with all of it, but it really an antidote of DDD or OO.

    However, do not get me wrong. My background is NOT VB6; and I am constantly trying to change VB6 culture.

    As a result, as you may guess, I have another side of the story. Recently, I feel that it is very important that we have to insist the domain entity model “first”.

    Note that as Fran pointed out, even you start from database schema, you can still do that, because here “first” is logical, not chronological.

    My experience is that a lot of VB6ers simply cannot use a “conceptual model”. It does not matter where that model comes from. My observation is that even they know the database very well, very strangely, they simply cannot leverage that when they do coding. You may think, my goodness, how difficult can that be! However, that is the reality. Poor guys. The paradigm shift is so difficult!

    My guess is that the logic importance of the “model” makes some of us believe it must be chronologically first. That is not true. However, I now do believe that because the “model” is so important, so, “politically”, it is worth it to say that it must be done first.

  • http://vikasnetdev.blogspot.com/ Vikas Kerni

    http://vikasnetdev.blogspot.com/2006/07/domain-driven-design-vs-data-driven.html

    http://vikasnetdev.blogspot.com/2006/08/domain-driven-design-based-on-entity.html

    Common legacy data challenges

    1. A single data field is used for several purposes.
    2. The purpose of a data field is determined by the values of one or more columns.
    3. Inconsistent values are being stored in a single data field
    4.There is inconsistent/incorrect data formatting with in a column.
    5.Important entities, attributes and relationships are hidden and floating in text fields.
    Data values can stray from their field descriptions and business rules.
    6. One attribute is stored in several fields.

  • http://codebetter.com/blogs/jeremy.miller jmiller

    Frans,

    “If you think starting with classes is any different than starting with an NIAM/ORM model which is transfered into E/R for tables you’re clearly mistaken: they both describe the same concepts: entities and relations between them, which attributes are in which entity, which entity inherits from which entity, what are the constraints etc. etc.”

    It’s not completely about data, relationships, and constraints. I’m also worried about cohesive classes, encapsulation, and reducing the duplication of business logic in the domain model. Yes, Frans, I get it, they represent the same logical data. Duh. But the domain model is responsible for the behavior and business logic of the application, and the exact structure of the database may not be the best way to build the domain model to carry out that business logic. What I should probably say more clearly is that the domain model is shaped by a little bit different set of forces and deserves its own design, which might easily need to diverge from the database model.

    I am generally against creating a database model first, or at least making the database the paramount piece, for a logic intensive application because I think it always seems to lead to an anemic domain model, procedural code, duplication, and tight coupling. If all of the early focus is on the database model the domain model tends not to get enough attention.

    When it comes time to choose a persistence tool, I want the flexibility to allow the domain model to diverge from the strict table model and an easy ability to evolve both models. Personally, I dislike any tool or framework that forces me to codegen my domain model classes, but I’m willing to write that off as a preference.

    I wouldn’t necessarily advocate trying to dump the database out of the object model ala NEO either. I’ll compromise with you and say there’s no reason you couldn’t happily shape the domain model and the database model at the same time, but I’ll never agree with database first on anything but reporting applications.

    Jeremy

  • http://weblogs.asp.net/fbouma Frans Bouma

    DevPrime: I don’t understand what you mean with:
    “So there is in fact a big disconnect between domain entity classes and domain entity database objects, bigger than just the relational-to-object rift. It’s a huge mistake to assume that anyone *should* map objects to tables or views one-to-one for many reasons.”
    The reason I don’t understand it is that you apparently use 2 different definitions of ‘entity’ which simply has 1 definition, the one it has had since the late 70′ies.

    An entity instance, i.e. the DATA forming the entity instance, is the instance of the single entity definition you use in your SYSTEM: e.g. the customer entity definition. That data is for example stored in a table Customer, and in memory the data is loaded into instances of the Customer class. Or perhaps it’s stored in multiple tables, and stored in a single instance of a single class.

    Does that matter? No not a single bit, because they’re conceptually the same thing. THAT’S why you can map one onto the other. Your argument implies that o/r mapping isn’t even possible.

    Everyone who claims there’s a big disconnect between entity classes and entity data in the DB has no clue whatsoever what an entity is in the first place.

  • http://weblogs.asp.net/fbouma Frans Bouma

    If you think starting with classes is any different than starting with an NIAM/ORM model which is transfered into E/R for tables you’re clearly mistaken: they both describe the same concepts: entities and relations between them, which attributes are in which entity, which entity inherits from which entity, what are the constraints etc. etc.

    So if you reverse engineer a relational model to a model which is at the same level as a NIAM/ORM model (and ORM means ‘Object Role Modelling’ http://www.orm.net) you simply can use that model to create entity classes which comes down to th same things as you would have when you would created the classes by hand first.

    Better yet, if you think you can simply create 500+ entity classes without thinking them through, you’re mistaken as well. No big relational model is created without a proper abstract model, and it’s perfectly doable to get back to that model with a relational schema at hand.

    Furthermore, the concept of the entity doesn’t change if you talk about classes or tables, they’re both containers which are physical representations of the entity definitions.

    I wrote an essay about this once, it’s in Jimmy Nilsson’s ADDD&P patterns book as well:
    http://weblogs.asp.net/fbouma/archive/2006/08/23/Essay_3A00_-The-Database-Model-is-the-Domain-Model.aspx

    Oh that Ted Neward article… What a joke. The thing is that the problem is very simple and it doesn’t matter at which side you’re starting, the END RESULT is the same. At least with the proper tools when you’re starting from the db side and with the proper level of knowledge when you’re starting at the classes side.

    So the situation actually is that you start with a set of abstract definitions of entities, which we could define as an entity model, at the level of a NIAM / ORM diagram for example. With that model you can for example create entity classes (if you’re into S&M, you can write them by hand for example for all 500 tables in your system) and you can also create the tables which will contain the entity instances. (You all do know the difference between an entity definition, entity class, entity class instance, and entity instance?)

    So really, I disagree with what you’re trying to say, Jeremy, because your article breaths ‘Don’t start with the database model!!’, while that’s completely not what most o/r mappers do. They reverse engineer the relational model to a higher level, which is the same level on which you’re thinking when you’re creating the entity classes.

    Because, and this is perhaps the hardest part, what’s in the DB and what’s in memory in your entity class instances, that’s THE SAME THING.

  • http://blah.winsmarts.com sahilmalik

    This cannot be used as a one size fits all approach. There may be instances where your main focus is early delivery and rapid application development. In such cases, you want the two to be as closely aligned to each other, in fact constrain the DB design to a specific set of rules, so the application can be delivered quicker.

    But yes, in an ideal world, with world peace, no hunger, loving spouses, trainable cats, and frictionless surfaces – this is the way to go.

  • http://devauthority.com/blogs/devprime/default.aspx DevPrime

    The problem is vaguely this –
    when designing a system, you have a conceptual model, a logical model, and a physical model (which in a minute, I’ll make multiply, so find a seat :-)). The conceptual model is a thinned version of the how the business sees the processes and entities that will eventually make up the system. It’s not rare to see conceptual models only showing bare minimum details to get the major ideas across. The logical model more acurately shows what needs to be done to create real entities in full detail, capable of enacting a real-life system. The physical model is a real-life implementation of the logical model. However, between the conceptual model and the logical model, you have a number of things that go on, for example, database object normalization. So “entities” don’t really map one-to-one between those two models. Now, so far I’ve only talked about data models, so what happens with domain model classes?
    Strictly speaking, domain model classes don’t necessarily care about things like normal forms, at least in the traditional relational data sort of way. In fact, they try to be the conceptual model with full details (after all, they are modeling the business, not the database). So there is in fact a big disconnect between domain entity classes and domain entity database objects, bigger than just the relational-to-object rift. It’s a huge mistake to assume that anyone *should* map objects to tables or views one-to-one for many reasons. Having said that, current O/R tools fall into camps that are mostly described as:
    1) don’t support any model other than one-to-one
    2) support other models, but only with some “backhand” magic and elbow grease (sometimes more, sometimes less)

    To answer another comment, while I can’t comment (yet) on much of the new progress with LINQ entities, I can say that I’m not 100% satisfied that it will be everything we want (and for more reasons than just the O/R mapping aspect), though the functionality has some great ideas and features in it.

    On a more positive note, I am currently working on such a system that allows very rich domain model classes and has an extremely high level of flexibility in terms of persistance (in fact, it will even take entities from service output rather than databases if need be). On the down side, it will be a while before I finish all the tools to go with it that will eventually make this framework as simple to use as current O/R mappers :-) Basically, the technology is more than possible, but current offerings just seem to be coming up a bit short, mostly because of the original intent and design of the tools.

  • http://codebetter.com/blogs/jeremy.miller jmiller

    Dan,

    I haven’t looked hard enough at the LINQ for Entities stuff to have too much of an opinion yet. The query language is obviously pretty cool, but I’m a little dubious about some of the details.

    “something like the Entity Data Modeller in Visual Studio Orcas is a bad thing because it does act as glue between data model and object model? Or is it a good thing because it allows the two to be distinct and then translates between the two?”

    If it works without being a PITA, isn’t optimized strictly for “write once,” and lets me do DDD and TDD without the database getting in the way, then it’s a good thing.

  • Dan Maharry

    Just to clarify here, are you saying that using something like the Entity Data Modeller in Visual Studio Orcas is a bad thing because it does act as glue between data model and object model? Or is it a good thing because it allows the two to be distinct and then translates between the two?

  • Lucas Goodwin

    I just wanted to point out that going the other way, designing your database to reflect your business object model (Atleast in an RDBMS) is equally bad.

    This has been my long-time struggle with DB design having come from a C++ environment using binary files (game programs). My natural instinct is to think of the DB as nothing more then a fancy binary object store.

    On another note, for our shop we’re looking at following 2 paths. For small applications use SubSonic (AKA ActionPack). For largerprojects, MyGeneration looks like a decent ORM. Obviously hand-rolled solutions can and will be merged into these two other solutions when appropriate.

    The biggest problem I see with most of these ORMs is the goal of removing your ability to modify and extend them outside of the tools. I suspect that’s a good deal of the issue with why people tend to “be afraid of their back hand”.