After being not so positive towards Microsoft last week, here’s something to be optimistic about:
Daniel Simmons from the Entity Framework team has a post up about fitting “Persistence Ignorance” (PI) into the Entity Framework release plan. I’m happy, but I really do wish there was just a little more representation from the Agile development community inside Microsoft itself so these issues would get more consideration earlier in the tool lifecycles (no I’m not shilling for the job, my wife wants to stay on the East coast now).
Unsurprisingly, Ayende has some good thoughts right off the bat: Persistence Ignorance in the Entity Framework.
My two cents
Yes, PI might be a little bit of a performance drag, but I’m more than willing to take that hit in exchange for better development productivity. Extreme performance requirements always leads to approaches that I really wouldn’t consider otherwise. I also don’t think it’s going to be that bad, especially since the EF uses emitted classes for transferring data between ADO.Net structures and CLR classes instead of reflection.
There’s another concern that change tracking becomes more difficult with a PI solution. Frans Bouma had a post a little while back on why he thought change tracking “had” to be part of the entities themselves. He makes a well reasoned case, but I still disagree for the majority of systems (I think I’d only buy his argument for database tables with large numbers of columns and/or extreme performance requirements). The main reason I disagree with Frans is that I’m frankly more worried about a clean separation of concerns and testability than I am about pure performance. I don’t want to have persistence related “noise” code in my business logic classes. It’s not just because it violates strict separation of concerns, it’s because it can easily make the business code harder to understand and change. I definitely don’t want to write change tracking code by hand. Lastly, I definitely don’t want to have to codegen my business classes just because it’s a convenient way to bake in all the persistence code. I want to use evolutionary design techniques because I think that’s more effective and efficient than heavy upfront design. Code generation can easily do more than harm in terms of doing evolutionary design, plus I think it encourages an Anemic Domain Model. Having PI goes a long, long way towards reducing friction in evolutionary design.
As for change tracking itself, I personally prefer a Unit of Work approach anyway rather than having “IsNew/IsDirty/Deleted” cruft in my domain model classes. To me, having to explicitly say what objects are changed, new, and deleted is actually a good thing. It makes the code relatively easy to understand and debug. Transaction boundaries become significantly easier to test because all you have to do is simply check the contents of the Unit of Work. This might be a personal preference thing, but I want transaction boundaries to be explicit (and easy to test).
Now a challenge to both Frans and the EF team, can you find a way to bake in the change tracking to an entity without it being the slightest bit obtrusive? I’m wondering if there’s an opportunity to do some sort of IL weaving to push that stuff in after the fact and make both of us happy.
Why I want Persistence Ignorance
- As Udi Dahan pointed out on Daniel’s blog, it starts with the Domain Model pattern as a desirable way to organize business logic. You can certainly do a Domain Model approach with even an intrusive Active Record approach, but I think it’s easier to manage with fullblown PI.
- Unit testing. The only way that Test Driven Development pays off is to write code that’s efficient to drive with automated tests. A domain model that’s fully Persistence Ignorant is far easier than business logic that directly manipulates the database (much less business logic in a sproc). With a PI class, I can instantiate the class with only the data I need and quickly run my unit test. If the database is involved I’ve generally got a lot more work to do to setup test data (think referential integrity), and usually in a different language (SQL vs C#). The real killer, and this matters a great deal, is the execution time of the unit tests. A test that runs completely in memory is more than an order of magnitude faster than a test that manipulates the database. If the unit tests are slow to run, I’m either losing time waiting for my build scripts to run, or I’m not running the tests very often which effectively negates having a unit test in the first place. Slow builds and tests are a significant drain on productivity (my Java web service takes nearly 10 minutes to build and redeploy with my client’s elaborate Maven stuff. It’s painful). I’ll certainly test my persistence code both in isolation and integration tests as well, but it’s always nice to keep the database out of the way until it’s necessary.
- I want to write business logic and I want to write persistence support, but not at the same time. One thing at a time.
- It’s useful to have the business logic completely divorced from the data storage. I’m not saying that it’s likely that you’ll want to swap out your database engine (but it’s not out of the realm of possibility), but you could easily find yourself wanting to use pieces of your business logic in completely new ways — ways in which you might not have any durable storage needs whatsoever. I’m thinking in specific about a case where my previous client would love to be able to use their analytics logic from a spreadsheet to provide “what if” scenarios, but can’t because the business logic is too tightly wedded to their database.