One new thing I want to start doing on this blog is releasing early drafts of my editor’s notes. There are a couple reasons for this.
- I want my ednotes to start being more technical anyway, and you good folks are the right people to help ensure that
- Most of the early drafts are going to be too long to fit on the 1 page I’m allotted, so here you’ll get a sense of what I’m trying to say (and can possibly give me thoughts on how to cut it down)
- It gives me one more blog post per month
At any rate, here’s the draft for the July issue’s note. Enjoy…
As you can see from the article lineup, we’re continuing our focus this month on issues related to software architecture with a subtle emphasis on how architectural decisions are impacted by the rise of cloud computing. And while it follow along those lines, I want to take advantage of this editor’s note to talk about something that I’ve spent some time thinking about recently – aggregates.
Now, those of you who I have had the pleasure of getting to meet (or those of you to whom I am related) can attest to at least two things about me. Firstly, I’m always really excited to talk about software development – and software architecture in particular. Secondly, in such conversations, I tend to talk louder and faster as the conversation goes on – this is amplified by adding someone like my good friend Glenn Block to the mix.
So on the topic of architecture and aggregates, an aggregate is a term which is used heavily by practitioners of domain driven design (DDD) and is defined as “A cluster of associated objects that are treated as a unit for the purpose of data changes. External references are restricted to one member of the AGGREGATE, designated as the root. A set of consistency rules applies within the AGGREGATE’S boundaries.” The canonical example used when describing aggregates is the order/order details example. In this case, an order aggregate is defined which encompasses both Order and OrderDetail classes. Order is defined as the aggregate root, meaning that OrderDetail instances could only be acquired via some behavior exposed by the root.
Now let me say that I am not only a believer in this way of decomposing and thinking about a model, but I further believe that the failure to create strong aggregate boundaries is one of the major drivers behind spaghetti-like application designs that are hopelessly coupled to a relational database schema, and too intertwined to easily evolve to support the message-based architectural paradigm that cloud computing brings.
So then why bring it up, other than to suggest that you think start thinking about your model in terms of aggregates? Because in following a lot of the discussions on DDD, and on aggregates more specifically, I think we may be going about the process of defining aggregate boundaries in a sub-optimal way. I think that at the root of the problem is that regardless of whether you design a database first or an object model first, both data representations yield a structure of fine-grained logical entities – and this similar way of thinking about the abstractions can inadvertently create tunnel vision where we overlook opportunities to simplify.
To put this into an example, I’ve been working on developing a Microsoft Word add-in to tie centrally-managed article metadata more tightly to the manuscripts themselves. The article metadata is managed in a SharePoint list – which as those of you who work in SharePoint know is hardly relational. In fact, in database terms, it much more closely resembles the star topology of Kimball-style data warehouse – and it was this difference in data store schema that forced me to rethink my aggregate definition.
Now, like many of you, I started designing my application by creating the domain model. The initial scenario that I wanted to support was browsing the list of articles and selecting the article that the Word document should be associated with. Because I wanted to not overwhelm my users with a giant list of articles, I added navigation to the scenario using a natural hierarchy from my domain, as seen here.
So what’s the problem here? This model looks similar to models we’ve all seen countless times over. And it is also for this type of model that people run into trouble defining aggregate boundaries. Why? Because it is based on data structure rather than behavior. To recap my scenario, I want to enable my users to navigate a set of articles using a natural hierarchy. By that logic, my hierarchy is simply a projection over the articles set – and this changes my model such that I have a clearly-defined, single article aggregate, as shown below.
This is not to say that you should not ever have traditional hierarchies in your object models. Instead, what I am trying to say is rather than defining aggregates by attempting to draw a line around a group of classes in a model, first define the course-grained aggregates based on behavior – then let your classes emerge from within the aggregate definition. You will avoid the analysis paralysis of trying to get the aggregate definitions “right” and you’ll be starting out on the right foot, building your system based on desired behavior.