Before you go on, I'm specifically worried at the moment about "Logic Intensive Systems" here. Systems that perform complex calculations, make optimizations, determinations, decisions, etc.
Many, if not most, enterprise applications have both an object model and the database model. For the most part, convergent evolution will probably lead the two models to be very similar, but it's potentially dangerous to constrain the two models to match perfectly because the two models reflect different concerns altogether.
- Database Model - When you design a database model you're primarily worried about the best way to structure data for efficient storage and retrieval, while also enforcing data integrity rules
- Object Model – The object model is first and foremost concerned with modelling the behavior and business logic of the system.
Ideally, I'd like to work on these two models somewhat independently and allow both models to reflect their different concerns first, and each other second.
O/R mapping of all flavors isn't that difficult to use (authoring an O/R mapper is a totally different story) when the database and the object model are very similar. The problem is that making the persistence easier by locking the object model to the database model can make writing and consuming the business logic harder. I'm working with a system that uses business objects that are basically codegen'd one to one from a legacy codebase with 400+ tables. The temptation and driver for codegen'ing the business objects is obvious (400+ tables).
The problem I'm seeing though is that consuming these business objects in the service layer is that the business objects do not really reflect the behavior of the business logic. Even worse in my mind is the fact that there is no encapsulation of the raw database structure from the service layer. If we had designed the business classes to reflect the behavioral needs to make writing the business logic easier (and testable), we would have ended up with a quite different structure. Just to throw up some examples:
- Big tables don't map to a single object. I don't think it's possible that a class with a 100 different properties can possibly be cohesive. We'd be much better off in terms of writing business logic if that 100 column table is modelled in the middle tier by a half dozen classes, each with a cohesive responsibility. It may make perfect sense to have only one table for the entire object hierarchy, but big classes are almost always a bad thing.
- Data Clump and Primitive Obsession code smells. A database row is naturally flat. I want to do a bigger post on this later, but think about a database table(s) with lot's of something_currency/something_amount combinations. There's a separate object for Money wanting to come out. If you make your business objects pure representations of the database you could easily end up with a large amount of duplicate logic around currency and quantity conversions.
- Natural cases for polymorphism in your object model. I think the roughest part of O/R mapping is handling polymorphism inside the database. Check out Fowler's patterns on database mappings for inheritance.
Back to the 400+ table problem. Yes, their database model is huge, but they aren't actually consuming most of the generated classes anyway. Looking at the bigger picture, I think it would have probably been easier to deal with the dissonance between business logic needs and the existing database structure in the database mapping, even though that potentially represents more work to do persistence, instead of exposing the raw database structure to the business domain classes and their consumers. In this case, I think making the business logic easier to use and consume would more than offset the extra persistence cost. Since I would expect the business logic to change more often than the database structure, I would also prefer to optimize my ability to modify the business logic in isolation from the database.
Besides, you definitely want to minimize coupling to a legacy database on the off chance that you might get to fix it up or move away from it later.
WARNING — If you are going to use O/R mapping of any kind, it's even more important than ever to properly enforce referential integrity rules in your database. I don't know if it's just my bad luck or what, but the legacy databases I've hit in the past couple years were all missing a lot of logical referential checks –and unexpected problems with orphan records quite logically ensued.
Object Relational Mapping is Hard at the Edges
If you haven't already read it, take a look at Ted Neward's seminal paper on the O/R quagmire. I thought he was exagerrating the problem on my first read, but now I'm not so sure. Automated, metadata driven O/R mapping (and I'm broadly including the LINQ varieties and codegen tools here too) gets really nasty at edge cases. There comes a point when the metadata driven approach starts to hurt more than it helps and it's probably better to revert to hand-rolled database mapper code in these cases. This issue is part of what drove me to write Being afraid of your backhand. One way or another, you will occasionally need the ability to allow the object model to diverge from the database model.
Not to put words into Neward's mouth, but reverse engineering your business domain classes from an existing database definitely fits his analogy comparing O/R mapping to a quagmire. This especially holds true for a legacy database that isn't, shall we say, pristine in structure.
Heck, the last time I willingly wrote a stored procedure was to take advantage of a PL/SQL feature to pull a logical hierarchy of data out of a flat database table. It worked beautifully thank you (of course the rest of the team threw a fit about using a sproc).
The Role of the Database
When people think about the role of a database in an enterprise system, I think there's a spectrum of thought with two polar extremes. I think the proper place in the spectrum should vary by application, but we all come in with presuppositions on the best way to write software based on our prior experences that impact the direction of our design. Where you sit on this spectrum has a lot to do with how you will approach application architecture vis a vis the database:
- The database is paramount, and the system is expressed and understood in terms of the tables and rows in the database. The application code and even user interface is just a conduit to get information back and forth into the database. You design the database first and then build the business and data layers to match the database. In the .Net world we might just consume raw DataSet's in the application, effectively just working with the database tables offline.
- The behavior of the system, primarily in the middle tier and user interface, is paramount, and the database is "just" a means to persist the state of the system. The database is either built to match the business classes or designed somewhat independently.
Reporting applications and simpler data entry applications can happily sit at the #1 data centric end of the spectrum. I think any system with significant business logic really needs to be edging over to the second end of the spectrum. The tricky part is recognizing when an application crosses the line from purely data centric to logic centric. I'm of the opinion that applying data centric development approaches to logic intensive systems leads to a world of trouble.