One of the nice benefits of a Specification is that one could write some code like the following:
IEnumerable<Customer> customers = CustomerRepository.AllMatching(CustomerSpecifications.IsGoldCustomer);
Writing code like this has allowed the developer to reuse a specification from the domain within their repository as a method for querying. While this may seem to be a good thing at the outset this mentality introduces a host of problems.
Performance
The first and largest problem that one will run into when dealing with this type of API is that the Repository is necessarily a leaky abstraction. The GoldCustomerSpecification is a piece of code, it represents a predicate for whether a single customer is or is not a gold customer. In order to return a set of customers that represents all of the customers matching the GoldCustomerSpecification the repository will need to run the specification on every customer.
Before looking at other types of infrastructure consider the ramifications of needing to run the specification on every customer that is in an “in memory” repository. The operation is necessarily O(n). As most learned in university, O(n) is a bad place to be as n gets large.
Even more unfortunate is that once this is considered with infrastructure the constants that we deal with grow. If one were using say a database now the work involved is not just running the specification on an existing object but hydrating the objects (and likely many of their children) in order to even be able to test the predicate upon the objects.
When dealing with such queries one prefers to be able to EXPOSE and then INDEX criteria allowing the operations to be treed/hashed resulting in algorithmic costs of O(log n) or less.
Weak Contracts
Beyond the performance issues there also exist further issues that will affect the development team. By exposing a method such as a “AllMatching” from the repository the importance of the contract offered by the repository is minimized.
Repositories represent a contract to a data store. If one uses all named operations upon the repository (as explained in [The Generic Repository]) one creates a strong contract to their data source. When it becomes time to analyze the system from a database performance perspective it is extremely easy to get a list of the queries that a given repository can run because they all originate within the repository itself as opposed to in other code.
Sometimes …
On the read side of your domain (a different layer if you use cqs) you want clients to be able to pass query objects directly to your repositories. Keep in mind that these are not the repositories on the transactional side (read: domain) but are supporting the complex reporting behaviors needed. It is often times not possible to completely isolate every type of report you may like to run (but you should still try to do this where possible as the strong contract has benefits).
I wanted it for a reason!
Having looked at the negatives there are two major positives to being able to use specifications within the domain.
Being forced to create both a query object and a specification for every predicate that is used in both places causes code duplication. Anytime there is duplication of code in a system there is a chance for the implementations to split. Someone may change the IsGoldCustomerSpecification to become a slightly new meaning but may forget that there is also a query object that is being used by a repository.
The creation of the two objects also necessitates a problem in the ubiquitous language. By creating the two objects there now exist two concepts in the ubiquitous language for technical reasons to represent what the domain expert considered to be a single concept. This may not sound at first like a big problem but it is artificially changing the domain language and because it is it opens the door to divergence in the concepts deliberately by the domain expert. If the terms diverge it becomes a very weird place in terms of conversation and will require a further divergence (renaming) of at least one of the terms.
The Solutions?
As has been explained, the Specification or Query Object problem can be quite troubling but there are many possible solutions that have been implemented in the past. All of the solutions share a common theme.
Composite Specification
The basic issue is that what is “inside” of the executable specification object cannot be easily accessed to be translated into a query object. By representing a specification as a tree of composite objects one can easily write a translator to convert the tree to the query language of choice.
This solution while good can be fairly expensive in terms of either initial cost and/or complexity of learning a tool if you use someone else’s as you will generally be forced to build up specifications using some form of a “fluent” API. This may evolve to be an internal DSL.
A Custom External DSL
A natural continuation to the composite object solution is to no longer represent specifications as code within my language of choice but to instead represent all specifications within a DSL. This solution hits the core of what is needed in any solution, an abstract version of what the specification means so one can look through the metadata to translate dynamically the specification to a query object when necessary.
This solution does however offer some distinct disadvantages. The largest of the disadvantages is that it requires a large amount of technical expertise at the time of the implementation to create and maintain an external DSL. Since the goal of the DSL is to allow access to an AST representing the code as opposed to the code itself and it will be “compiled” against multiple sources, the DSL would necessarily have to be external as most modern languages C#/Java as an example did not support such things.
The technical issues of creating the DSL could have been mitigated by the creation of an open source project to maintain such a DSL. There would still have been problems however in that developers would still now need to understand two languages (and likely two or more processes of code generation). Requiring developers to understand more tools is generally best avoided in preference of solutions that do not require the learning of new tools.
Linq?!
Recent developments at Microsoft have created LINQ that is essentially an internal DSL that does the same task as the previously discussed external DSL. It does however get around many of the issues with an external DSL by its being an internal DSL.
To begin with, there is no technical hurdle to implementing LINQ. Microsoft has already done it and has stepped up to insure that it is supported for the foreseeable future making it a safe choice to bring in on projects. Microsoft has also added it directly to languages such as VB.NET and C# which prevents developers from needing to learn a new tool in order to be able to use it.
The basic workings of Linq is to allow an expression to be expressed as an expression tree (a type of AST). This expression tree can as in the DSL example be translated to multiple formats.
Hint: Query Objects and Specifications although closely related are quite different with varying needs. Although often confused it is imperative that their differences be understood. If available, language tools such as an internal/external DSL or Linq can be used to allow specifications to be expressed as expression trees allowing them to be translated as query objects as opposed to only as executable code.
Posted
Tue, Jan 20 2009 2:06 PM
by
Greg