MSDN Mag 6/2009 Editor’s Note Sneak Peek – On Aggregates

One new thing I want to start doing on this blog is releasing early drafts of my editor’s notes.  There are a couple reasons for this.

  • I want my ednotes to start being more technical anyway, and you good folks are the right people to help ensure that
  • Most of the early drafts are going to be too long to fit on the 1 page I’m allotted, so here you’ll get a sense of what I’m trying to say (and can possibly give me thoughts on how to cut it down)
  • It gives me one more blog post per month :)

At any rate, here’s the draft for the July issue’s note.  Enjoy…


As you can see from the article lineup, we’re continuing our focus this month on issues related to software architecture with a subtle emphasis on how architectural decisions are impacted by the rise of cloud computing. And while it follow along those lines, I want to take advantage of this editor’s note to talk about something that I’ve spent some time thinking about recently – aggregates.

Now, those of you who I have had the pleasure of getting to meet (or those of you to whom I am related) can attest to at least two things about me. Firstly, I’m always really excited to talk about software development – and software architecture in particular. Secondly, in such conversations, I tend to talk louder and faster as the conversation goes on – this is amplified by adding someone like my good friend Glenn Block to the mix.

So on the topic of architecture and aggregates, an aggregate is a term which is used heavily by practitioners of domain driven design (DDD) and is defined as “A cluster of associated objects that are treated as a unit for the purpose of data changes. External references are restricted to one member of the AGGREGATE, designated as the root. A set of consistency rules applies within the AGGREGATE’S boundaries.” The canonical example used when describing aggregates is the order/order details example. In this case, an order aggregate is defined which encompasses both Order and OrderDetail classes. Order is defined as the aggregate root, meaning that OrderDetail instances could only be acquired via some behavior exposed by the root.

Now let me say that I am not only a believer in this way of decomposing and thinking about a model, but I further believe that the failure to create strong aggregate boundaries is one of the major drivers behind spaghetti-like application designs that are hopelessly coupled to a relational database schema, and too intertwined to easily evolve to support the message-based architectural paradigm that cloud computing brings.

So then why bring it up, other than to suggest that you think start thinking about your model in terms of aggregates? Because in following a lot of the discussions on DDD, and on aggregates more specifically, I think we may be going about the process of defining aggregate boundaries in a sub-optimal way. I think that at the root of the problem is that regardless of whether you design a database first or an object model first, both data representations yield a structure of fine-grained logical entities – and this similar way of thinking about the abstractions can inadvertently create tunnel vision where we overlook opportunities to simplify.

To put this into an example, I’ve been working on developing a Microsoft Word add-in to tie centrally-managed article metadata more tightly to the manuscripts themselves. The article metadata is managed in a SharePoint list – which as those of you who work in SharePoint know is hardly relational. In fact, in database terms, it much more closely resembles the star topology of Kimball-style data warehouse – and it was this difference in data store schema that forced me to rethink my aggregate definition.

Now, like many of you, I started designing my application by creating the domain model. The initial scenario that I wanted to support was browsing the list of articles and selecting the article that the Word document should be associated with. Because I wanted to not overwhelm my users with a giant list of articles, I added navigation to the scenario using a natural hierarchy from my domain, as seen here.

clip_image002

So what’s the problem here? This model looks similar to models we’ve all seen countless times over. And it is also for this type of model that people run into trouble defining aggregate boundaries. Why? Because it is based on data structure rather than behavior. To recap my scenario, I want to enable my users to navigate a set of articles using a natural hierarchy. By that logic, my hierarchy is simply a projection over the articles set – and this changes my model such that I have a clearly-defined, single article aggregate, as shown below.

clip_image004

This is not to say that you should not ever have traditional hierarchies in your object models. Instead, what I am trying to say is rather than defining aggregates by attempting to draw a line around a group of classes in a model, first define the course-grained aggregates based on behavior – then let your classes emerge from within the aggregate definition. You will avoid the analysis paralysis of trying to get the aggregate definitions “right” and you’ll be starting out on the right foot, building your system based on desired behavior.

About Howard Dierking

I like technology...a lot...
This entry was posted in architecture, DDD. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://codebetter.com/members/Greg/default.aspx Greg

    ” However, would love to chat more with you about this, and can go into the specifics of the code if you would like.”

    That would be fun. I will be in Vancouver the beginning of June, will you be up that way for alt.net canada and/or devTeach?

    btw: the aggregate root being tied to OO concepts is an interesting question: I posted it on the ddd list.

  • http://codebetter.com/members/Greg/default.aspx Greg

    Sorry it took me a while to reply

    “Wrt aggregate roots vs. UML aggregation associations, I’m pretty confident that I get the difference. Also, I’m not passing my live domain objects across the wire, but the way I define the aggregate on the client shapes the needs I have of the messages that will go across the wire.”

    No it doesn’t your aggregate root hjas everything to do with a consistency boundary and transactional behaviors. It has nothing to do with what things look like on the wire what so ever.

    I can’t stress this enough! the client model will diverge in any but the most trivial (CRUD) of applications from the domain model. Said differently if the model is not different one should not be using a domain, it would have taken less time to just use a DAL (but if so the “domain” is probably just a DAL anywyas)

    “At a high level, I wonder if your objections have to do with my using a DDD concept independently of the DDD corpus. I will absolutely acknowledge that I am doing this (though I would argue less so than you are asserting). I believe that concepts like aggregates can still be highly beneficial without requiring all of the tenants of DDD – and certainly should be able to be beneficial without the ceremony of object-orientation. However, would love to chat more with you about this, and can go into the specifics of the code if you would like.”

    I am not saying that you are not “following the tenets of DDD” I am saying that what you are doing is not defining aggregate boundaries as defined by DDD. Simply putting uni-directional relationships in is not that process. Anazlyzing how you want “query results” is not part of that process.

    The process is about analyzing the behavior of the objects involved (how do they mutate?) and identifying consistency/transactional boundaries.

    I guess you could call that definition of aggregate roots a “ddd tenet” but if you aren’t following them are you even using aggregate roots anymore? What would differentiate aggregate roots from an aggregate if not that?

    btw: using an aggregate root “without object oritentation” is an interesting proposition. At the least you would need

    objects
    encapsulation

    you would probably need abstraction and polymorphism as well …

    I am not sure I agree that the aggregate root pattern can exist outside of an object oriented system.

  • http://codebetter.com/members/hdierking/default.aspx hdierking

    “This is not behavior nor dealing with an aggregate, it is a report but that’s another conversation.”

    would love to know more on this one, because I don’t see it that way at all.

    Wrt aggregate roots vs. UML aggregation associations, I’m pretty confident that I get the difference. Also, I’m not passing my live domain objects across the wire, but the way I define the aggregate on the client shapes the needs I have of the messages that will go across the wire.

    “Well established models tend to stop exposing their state and to only expose behaviors … at this point they look nothing like data models.”

    I completely agree here – and I would argue that defining proper aggregate boundaries (even around what starts off looking like a data model) is a path towards this end. I’m not sure I’m completely following your second comment.

    At a high level, I wonder if your objections have to do with my using a DDD concept independently of the DDD corpus. I will absolutely acknowledge that I am doing this (though I would argue less so than you are asserting). I believe that concepts like aggregates can still be highly beneficial without requiring all of the tenants of DDD – and certainly should be able to be beneficial without the ceremony of object-orientation. However, would love to chat more with you about this, and can go into the specifics of the code if you would like.

  • http://codebetter.com/members/Greg/default.aspx Greg

    “Now, like many of you, I started designing my application by creating the domain model. The initial scenario that I wanted to support was browsing the list of articles and selecting the article that the Word document should be associated with. Because I wanted to not overwhelm my users with a giant list of articles, I added navigation to the scenario using a natural hierarchy from my domain, as seen here.”

    This is not behavior nor dealing with an aggregate, it is a report but that’s another conversation.

    “My point here is that even if this were merely a data operation, defining the article aggregate *first* as the unit of data being managed across the service boundary (and then understanding which entities fit within) actually eliminated quite a bit of complexity.”

    Let’s be careful between Aggregate Roots, Evans and Aggregates (diamonds in UML).

    Aggregates only exist with behavior. You have not actually put any behavior into your aggregate, it is not an aggregate root. Making relations uni-directional is a good start towards working towards and aggregate root but you really need to be focusing out how you are modeling your behaviors.

    Also why are you passing aggregates across service boundaries? Certainly the putting of “domain objects on the wire” or using their schemas over a service boundary has been looked at as a worst case.

    The reason I bring this up is …

    “I think that at the root of the problem is that regardless of whether you design a database first or an object model first, both data representations yield a structure of fine-grained logical entities – and this similar way of thinking about the abstractions can inadvertently create tunnel vision where we overlook opportunities to simplify.”

    I have 2 comments to this. The first is that I would wager a guess that you are creating a domain that looks like an entity model. You don’t have to do that. Well established models tend to stop exposing their state and to only expose behaviors … at this point they look nothing like data models.

    The second is that all of these decisions in your domain are based on behaviors. If you are modelling data model out, you don’t know what those behaviors will be yet (they change the model).

  • http://codebetter.com/members/hdierking/default.aspx hdierking

    I could go in depth explaining the details of the application in an attempt to show how there is an actual domain (I will say that were the problem really just a simple data sync problem, I would have leveraged built-in functionality and not written a custom add-in) – but that’s not really the point of this post – nor do I think that defining solid aggregate boundaries has value only in the ‘full’ DDD context.

    My point here is that even if this were merely a data operation, defining the article aggregate *first* as the unit of data being managed across the service boundary (and then understanding which entities fit within) actually eliminated quite a bit of complexity.

  • http://codebetter.com/members/Greg/default.aspx Greg

    I guess that I am failing to find the “domain” here. I see data going back and forth in terms of “lists”. sync’ing is not really behavior, its a data operation.

    I am not trying to put down your efforts this just does not sound like a place where DDD is very appropriate as I first hear about it.

  • http://codebetter.com/members/hdierking/default.aspx hdierking

    @Greg – not read only access – bi-direction sync. For example, update the title in the Word doc updates it in SharePoint.

  • http://codebetter.com/members/Greg/default.aspx Greg

    “o put this into an example, I’ve been working on developing a Microsoft Word add-in to tie centrally-managed article metadata more tightly to the manuscripts themselves. The article metadata is managed in a SharePoint list – which as those of you who work in SharePoint know is hardly relational. In fact, in database terms, it much more closely resembles the star topology of Kimball-style data warehouse – and it was this difference in data store schema that forced me to rethink my aggregate definition.”

    It sounds like this is read-only access … how exactly are you defining any behavior?

  • http://codebetter.com/members/WarePhreak/default.aspx WarePhreak

    I uaually like to read articles in a series together. Would that tie into the models to help accent the point or just complicate the issue?