Sponsored By Aspose - File Format APIs for .NET

Aspose are the market leader of .NET APIs for file business formats – natively work with DOCX, XLSX, PPT, PDF, MSG, MPP, images formats and many more!

DDD: Specification or Query Object

One of the nice benefits of a Specification is that one could write some code like the following:

IEnumerable<Customer> customers = CustomerRepository.AllMatching(CustomerSpecifications.IsGoldCustomer);

Writing code like this has allowed the developer to reuse a specification from the domain within their repository as a method for querying. While this may seem to be a good thing at the outset this mentality introduces a host of problems.

Performance
The first and largest problem that one will run into when dealing with this type of API is that the Repository is necessarily a leaky abstraction. The GoldCustomerSpecification is a piece of code, it represents a predicate for whether a single customer is or is not a gold customer. In order to return a set of customers that represents all of the customers matching the GoldCustomerSpecification the repository will need to run the specification on every customer.

Before looking at other types of infrastructure consider the ramifications of needing to run the specification on every customer that is in an “in memory” repository. The operation is necessarily O(n). As most learned in university, O(n) is a bad place to be as n gets large.

Even more unfortunate is that once this is considered with infrastructure the constants that we deal with grow. If one were using say a database now the work involved is not just running the specification on an existing object but hydrating the objects (and likely many of their children) in order to even be able to test the predicate upon the objects.
    
When dealing with such queries one prefers to be able to EXPOSE and then INDEX criteria allowing the operations to be treed/hashed resulting in algorithmic costs of O(log n) or less.

Weak Contracts
Beyond the performance issues there also exist further issues that will affect the development team. By exposing a method such as a “AllMatching” from the repository the importance of the contract offered by the repository is minimized.

Repositories represent a contract to a data store. If one uses all named operations upon the repository (as explained in [The Generic Repository]) one creates a strong contract to their data source. When it becomes time to analyze the system from a database performance perspective it is extremely easy to get a list of the queries that a given repository can run because they all originate within the repository itself as opposed to in other code.

Sometimes …
On the read side of your domain (a different layer if you use cqs) you want clients to be able to pass query objects directly to your repositories. Keep in mind that these are not the repositories on the transactional side (read: domain) but are supporting the complex reporting behaviors needed. It is often times not possible to completely isolate every type of report you may like to run (but you should still try to do this where possible as the strong contract has benefits).

I wanted it for a reason!
Having looked at the negatives there are two major positives to being able to use specifications within the domain.

Being forced to create both a query object and a specification for every predicate that is used in both places causes code duplication. Anytime there is duplication of code in a system there is a chance for the implementations to split. Someone may change the IsGoldCustomerSpecification to become a slightly new meaning but may forget that there is also a query object that is being used by a repository.

The creation of the two objects also necessitates a problem in the ubiquitous language. By creating the two objects there now exist two concepts in the ubiquitous language for technical reasons to represent what the domain expert considered to be a single concept. This may not sound at first like a big problem but it is artificially changing the domain language and because it is it opens the door to divergence in the concepts deliberately by the domain expert. If the terms diverge it becomes a very weird place in terms of conversation and will require a further divergence (renaming) of at least one of the terms.

The Solutions?
As has been explained, the Specification or Query Object problem can be quite troubling but there are many possible solutions that have been implemented in the past. All of the solutions share a common theme.

Composite Specification
The basic issue is that what is “inside” of the executable specification object cannot be easily accessed to be translated into a query object. By representing a specification as a tree of composite objects one can easily write a translator to convert the tree to the query language of choice.

This solution while good can be fairly expensive in terms of either initial cost and/or complexity of learning a tool if you use someone else’s as you will generally be forced to build up specifications using some form of a “fluent” API.  This may evolve to be an internal DSL.

A Custom External DSL
A natural continuation to the composite object solution is to no longer represent specifications as code within my language of choice but to instead represent all specifications within a DSL. This solution hits the core of what is needed in any solution, an abstract version of what the specification means so one can look through the metadata to translate dynamically the specification to a query object when necessary.

This solution does however offer some distinct disadvantages. The largest of the disadvantages is that it requires a large amount of technical expertise at the time of the implementation to create and maintain an external DSL. Since the goal of the DSL is to allow access to an AST representing the code as opposed to the code itself and it will be “compiled” against multiple sources, the DSL would necessarily have to be external as most modern languages C#/Java as an example did not support such things.

The technical issues of creating the DSL could have been mitigated by the creation of an open source project to maintain such a DSL. There would still have been problems however in that developers would still now need to understand two languages (and likely two or more processes of code generation). Requiring developers to understand more tools is generally best avoided in preference of solutions that do not require the learning of new tools.

Linq?!
Recent developments at Microsoft have created LINQ that is essentially an internal DSL that does the same task as the previously discussed external DSL. It does however get around many of the issues with an external DSL by its being an internal DSL.

To begin with, there is no technical hurdle to implementing LINQ. Microsoft has already done it and has stepped up to insure that it is supported for the foreseeable future making it a safe choice to bring in on projects. Microsoft has also added it directly to languages such as VB.NET and C# which prevents developers from needing to learn a new tool in order to be able to use it.

The basic workings of Linq is to allow an expression to be expressed as an expression tree (a type of AST). This expression tree can as in the DSL example be translated to multiple formats.

Hint: Query Objects and Specifications although closely related are quite different with varying needs. Although often confused it is imperative that their differences be understood. If available, language tools such as an internal/external DSL or Linq can be used to allow specifications to be expressed as expression trees allowing them to be translated as query objects as opposed to only as executable code.

This entry was posted in DDD. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

10 Responses to DDD: Specification or Query Object

  1. taowen says:

    There is a way to combine both ways. I did a experiment some days ago, it is for this syntx:

    List inProgressTasks = tasks.find(anyTask.startDate).le(new Date()).find(anyTask.endDate).isNull().findAll();

    under the hood it is:

    Query query = tasks.find(anyTask.startDate).le(new Date());
    query = query.find(anyTask.endDate).isNull();
    List inProgressTasks = query.findAll();

    So, query is immutable object. One thing that is difficult is “find(anyTask.startDate)”, in the find method we only get the value of the startDate, but we have no clue it is the value of startDate or something else. I solve it by

    Task anyTask = QueryParamterFactory.any(Task.class);

    then we static import any method to make it look nicer. Under the hood, it is using reflection to set some thing fake in the startDate, and save the value and field info in some thread local hash map. Then we use System.identityHashCode to get the field info back in find method. Just small trick, not import to the topic.

    this is the first step

    —-

    I soon find out this model is flawed, as field should be private. So, unless we are querying inside the domain object, we can not have access to those fields. Then, why not we encapsulate those query building process as part of the domain? After some refactoring, it becomes:

    Query query = task.inProgress(query);

    so, inProgress refine the query to filter out not in progress tasks. If we furthur put the current query as a implicit context (aka stored in a thread local). Then it becomes

    List inProgressTasks = task.inProgress().findAll();

    and we can even

    List startedTasks = union(task.inProgress(), task.finished()).findAll();

    this way we can build a big query incrementally, by refine the query through those “domain query builder”.

    this is second step
    —-

    After I read stuff about one model can not fit all. I begin to realize, we can have separate domain model for query, then it makes perfect sense here. The domain model for query is the natural home for those query builders. Previously, those query builders are too out-standing in other transaction processing methods.

    All those thing begin from one thing, when I saw a big SQL, I can not help to think can SQL being break down into small pieces and re-factored into objects. Many times, the SQL is repeating some domain structure even logic over and over again, as we can not reuse those small criterias.

  2. taowen says:

    Query object, specification or (detached) criteria is not necessary to be monolithic or to be built at one place. They can be composite, and be broken down into smaller query object, specification or criteria. It would be very interesting to find a home for those small piece of criteria. For example, a task with start time earlier than now, and end time not set is considered as InProgress. So normally we can have a TaskRepository which has a method findInProgressTasks. But why not the task itself tell you what is considered as InProgess. Then we can have task.InProgress returning criteria, the we can pass them to the repository. How about that?

  3. Ian Cooper says:

    I wrote a long post on how to write specifications using an expression builder on my old blog:

    http://iancooper.spaces.live.com/blog/cns!844BD2811F9ABE9C!451.entry

    It’s old but mostly still valid

  4. Michael Hart says:

    (OT: Has been a few hours since I posted a second reply and it’s yet to show, so just double checking that it’s a moderation issue rather than a comment-sucking vortex)

  5. Ian Chamberlain says:

    I decided to write my view and post it in total. We can discuss in detail on Alt.Net.
    http://systemfutures.spaces.live.com/blog/cns!AD5058A4F6569231!242.entry

  6. Michael Hart says:

    Right, got that – my question was more whether you’d have a general translator that would cover most ASTs, and then use custom translators for certain specifications where the general translator produces sub-optimal queries? You may have two specifications that have the same AST structure, but require different query structures due to infrastructure-specific details. Or am I still missing something?

  7. Greg says:

    @Michael:

    Keep in mind that I would be translating the “AST” of a specification into a specific query object. The translator can do whatever it wants in that translation. Perhaps I was not clear on this…

    Cheers,

    Greg

  8. Michael Hart says:

    Just to clarify, and not that you were suggesting otherwise, but there’d still need to be the ability to create context-specific mappings between the Specification and the Query objects wouldn’t there?

    The DSL translation tool may be able to handle most criteria, but wouldn’t there be circumstances where the index would need to be hand picked, or the persistence-specific query massaged? (in a way that cannot be expressed by the DSL that is)

    Also, out of interest, are your Query objects specific to your persistence method, or are they datastore-agnostic? The pattern in P of EAA suggests the former (ie, by specifically mentioning databases and SQL), but I’m not sure if that reduces their usefulness.

  9. BjartN says:

    If have found that creating my specifications using linq expressions, and implementing the repository using Linq To NHibernate gives me alot of choices when it comes to the actuall database. If you’re set on a MSSQL db you could also implement the repository using Linq TO SQL, but personally i’m not a big fan of all the generated code.

    The only problem is that Linq To NHibernate isn’t a very active project…

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>