CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Jeffrey Palermo (.com)

Blog moved to www.jeffreypalermo.com

The _real_ reason to shy away from DataSets - level 300

If the title of this post gets you worked up, you probably shouldn't read on.  I'm about to bash your beloved DataSet.

Scott Mitchell has written more on why to avoid DataSets.  While he touches on performance and the mechanics of what it does, I believe the issue is bigger.

Everyone can argue and compare performance numbers in different scenarios to justify one versus the other.  The real argument, I think, is design.  No matter what you do, your application must perform well enough to meet the customer's expectations.  The customer doesn't care about the internal workings.  So what other motivations do we have? 

What about OO design.  What about an object owning state and all behavior that goes with that state?  A DataSet cannot substitute for a domain object.  It has no custom behavior.  It holds records.  As long as DataSets are pulled and bound to the UI, the application will be of Procedural nature.  I'm coming from a stand-point where I care about OO and the maintainability and testability gains associated with it, so DataSets don't hold much weight with me. 

Now we are talking about another issue altogether.  Are you dragging and dropping a RAD application that will have a short lifespan (because RAD and Maintainability are antonyms)?  If so, you can stop reading now because this discussion has no bearing on your application.

If your intention is to develop a well-factored object-oriented system with loose coupling between layers and high class cohesion, then please read on.  You will want to identify entities in your system and develop your domain object first, NOT your database schema.  If you start with the database schema first, you are starting with a handicap that is often hard to overcome.  Start with your domain objects first and define the entities that your system will manage.  Each of these entities will own some state and will encapsulate this state with custom behavior through their methods.  One example is that if you have a Product object and you try to set the Price field to a negative number, the object shouldn't allow it.  The object is responsible for protecting its state.  If you have a dumb data container (DataSet) floating around, then you have to jerry-rig some validating logic before you “Update” the changes back to the database. 

If you have a well-factored OO design, you will always be able to extend the system with new functionality as well as change existing functionality with minimal impact.  This can exist because each small responsibility is housed in its own object and not shared among the logical layers.

I really don't care about convincing DataSet users to enroll in Rehab (unless they are on my team - which they aren't).  Good design isn't easy, but the benefits are worth the effort.  DataSets are easy.  Speaking of easy. . . here are some other things that are easy:

  • Maxing out your credit card
  • Not paying your bills
  • Sleeping in
  • Settling for mediocrity
  • Posting an nonobjective comment to this blog post

For the record, I have used a DataSet (and a strongly-typed one), but not any more.

ADDITION:

I must say that the use of a DataSet object in an application doesn't condemn it's design to the lake of fire, but relying on that as your _business object_ does.  If there is a situation where the overhead is acceptable, use it _inside_ your business object or somewhere else, but protect your entities.  Don't pass around naked entities in a DataSet exposed to the world.  Use encapsulation.  Protect your data.

Perhaps my main beef is with the use of DataSets as promoted by the VS.Net drag and drop tools.



Comments

Jeffrey Palermo said:

That's all well and good, as I posted about my mixed use of datasets. My question for you is the same as my question for Scott. How do you pass around aggregate/summary data?

Are you passing a datareader to the UI? If you did you just broke the barrier as datareaders are connected objects while datasets are disconnected.

Are you creating a new business object for every aggregate report? That would seem to me to be the coding equivalent of killing a mosquito with a bazooka.
# May 17, 2005 6:20 AM

Jeffrey Palermo said:

Good posts on both.

Fowler has a good discussion of this in Patterns of Enterprise Application Architecture. He has a full chapter on the plusses and minuses of a domain model vs a table module approach for structuring your middle tier. Read it in the bookstore like everyone else;)

If your app is primarily a reporting application with minimal business logic, of course you use the built in capabilities of the DataSet. Even Fowler (Mr. Domain Model) says .NET has a compelling case for this. In this case your "Domain Model" can be a set of classes that perform aggregations, transformations, and mapping to and from a DataSet.

Of course you never pass a DataReader to the UI layer. Not to put words into Jeffrey's mouth, but I know he's thinking about using NHibernate or some other external mapping OR solution for persistence with his domain classes.

As goofy as it sounds, I've gone the unusual way of creating a DataSet from an array of business objects for the purposes of data binding. Not defending that approach, but it let us have the best of both worlds. Domain classes in the business and service layers, DataSet's in the controller and view layers.



# May 17, 2005 8:09 AM

Jeffrey Palermo said:

Jeremy,
That's a very interesting case. You are protecting your data and then using the DataSet to spit out a read-only report. Probably the biggest mistake people make is using the DataSet to make updates.
# May 17, 2005 8:24 AM

Jeffrey Palermo said:

I love discussions about DataSet vs. Business Objects. It's the subject that people form both sides can have valid points. I believe that there's no silver bullet.

I agree with Jeffrey that if you're developing an enterprise application that's going to be enhanced and maintained for a long period of a time, domain model is a winner. However, if you're working on a small application that you're probably never going to touch again, DataSet should be considered.
# May 17, 2005 8:47 AM

Jeffrey Palermo said:

Eric,
I read your post on the subject, http://www.codebetter.com/blogs/eric.wise/archive/2005/05/02/62743.aspx, and all you said was well and good. There is no danger there of data changing and breaking domain rules since you are using it for read-only purposes. If it can help databinding, then it becomes a UI technique. What I wanted to hit on was the importance of a solid OO design, and too often I find those wanting to using the DataSet as a bridge from the database to the UI.
# May 17, 2005 8:48 AM

Jeffrey Palermo said:

No assembly should have a reference to ADO.NET except the data layer, so that would preclude any ADO.NET class from crossing that boundary.

If I need an aggregate object, I'll create an aggregate object, or maybe just a smart collection of by domain objects. My domain objects are nothing more than holders of their members with methods to protect them. Some people assume that business objects are "heavy" and that a DataSet is "light". If you look at the class structure of the DataSet, you'll find layers and layers of objects. The footprint of a collection of domain objects would be less than the same information in a DataSet. If a requirement of the app is aggregates, then make an object that knows how to handle aggregates.

As Jeremy pointed out, the DataSet can be useful but only to augment a strong domain model.

Eric, your usage of them is also good because you use the databinding features for better UI manipulationg, but you _don't_ use them as the owner of the information to be changed and sent back to the database.
# May 17, 2005 8:52 AM

Jeffrey Palermo said:

Well said. Another reason is that it is not possible to mock up a ST DS with NMock. Takes the app further away from automated testing.

My $0.02
# May 17, 2005 9:05 AM

Jeffrey Palermo said:

I think these opinions arise from considering DataSet to be some bussiness logic object, whilest it is just data-holding or data-carrying object. So if you consider it as "smarter" array, you won't get into traps of OO design. You still have to organize your business logic and objects well - and if you like arrays or structs - use them, but impedance between DataSet and databases is pretty low in .NET. I am not speaking about performance: usually it's a trade-off between dev-performace and run-performance.
# May 17, 2005 9:42 AM

Jeffrey Palermo said:

OK, I'll step-up and be the sacrifical whipping boy! :) I happen to be one of the depised that use (strongly-typed) DataSets in certain (note that I'm not saying all!) enterprise solutions. Domain model vs. table module/recordset is a foolish argument in the first place, because, like so many other things, there is no "one true way". If the choice was so black/white I doubt this blog entry would exist! Many roads lead to the promised land -- choose the one that makes sense for your particular journey. But, nonetheless, I'll take the bait:

"What about OO design. What about an object owning state and all behavior that goes with that state? A DataSet cannot substitute for a domain object."

We're talking apples-oranges here -- if you're puruing a table module pattern then the DataSet is acting strictly as a data transfer object and is not an enity! Any business logic lives in a seperate layer and operates on your DTOs. If you must have your behavior attached to your data than you'll probably never be able to take off your OO shades in the first place.

"I'm coming from a stand-point where I care about OO and the maintainability and testability gains associated with it, so DataSets don't hold much weight with me."

So am I. Lets just examine for a moment the code I have to write/maintain with a domain model solution:

- Binding (IListSource, IList, IBindingList and IEditableObject)
- O/R mapping layer
- Sorting/filtering (i.e. are you going to do an OPath-like solution or use some predicate buckets)
- State change tracking (oh yeah, if I have a collection of entity objects I need someway to track updates/deletes/inserts...)

That sure sounds like a lot of code for me to maintain (I know it is because I've done it). O/R mappers help with codegen but which to choose or do I roll my own? There's no standards here and the complexity incurred by the object-relational impedence mismatch necessarily dictates I'm going to have a very complicated solution with many moving parts(which I translate to "hard to maintain" for the less than Jedi average staff programmers!)

"If you have a dumb data container (DataSet) floating around, then you have to jerry-rig some validating logic before you “Update” the changes back to the database."

I don't see this as an inherent liability and hooking the DataSet events to run data validation logic is hardly "jerry-rigging" -- that's a primary reason Micrsoft put them in there!

"If you have a well-factored OO design, you will always be able to extend the system with new functionality as well as change existing functionality with minimal impact."

This is a red-herring. If your DBA extends the schema, presumably it is to add new functionality which you would have to project into either your domain model or table module design -- there is no free lunch!

# May 20, 2005 12:20 PM

Jeffrey Palermo said:

My problem with DataSets is simple. They are conceived and designed around an XML view of data, not a fully relational view. They handle one to many relationships well, but fail horribly with one to one relationships. Basically, if all you ever normalize to is 3rd Normal Form, you will not have problems with DataSets. But if you normalize beyond 3NF (and yes, I need to. It's called entity super-type/sub-type) then you will have problems with DataSets (and most other data access solutions).

DataSets are too XML oriented. They primarily support hierarchies, not relationships (especially when you get into databinding against a DataGrid).
# May 23, 2005 5:28 AM

Shubhabrata DE said:

Wonderfully said. Those who do not agree really need to be in rehab. there would always be people who would take the short and easy way out. The point that I really like in this article is "No matter what you do, your application must perform well enough to meet the customer's expectations.  The customer doesn't care about the internal workings.  So what other motivations do we have? "

# November 20, 2007 11:08 PM

About Jeffrey Palermo

Jeffrey Palermo is a software management consultant and the CTO of Headspring Systems in Austin, TX. Jeffrey specializes in Agile coaching and helps companies double the productivity of software teams. Jeffrey is an MCSD.Net , Microsoft MVP, Certified Scrummaster, Austin .Net User Group leader, AgileAustin board member, INETA speaker, INETA Membership Mentor, Christian, husband, father, motorcyclist, Eagle Scout, U.S. Army Veteran, and Texas A&M University graduate. Check out Devlicio.us!

Our Sponsors

Free Tech Publications

This Blog

Syndication

News

Headspring Systems

View Jeffrey Palermo's profile on LinkedIn

See my new blog at .jeffreypalermo.com