DDD: The Generic Repository

This discussion came up on the alt.net list last night and I had this post about half way done so I agreed to push it to the top of my stack and get it done for today given the timeliness of the information. I apologize that I lied about what the next post would be. I will have the follow up on the last DDDD post next.

Consider the following code:

Repository<Customer> repository = new Repository<Customer>();
foreach(Customer c in repository.FetchAllMatching(CustomerAgeQuery.ForAge(19)) { }

The intent of this code is to enumerate all of the customers in my repository that match the criteria of being 19 years old. This code is fairly good at expressing its intent in a readable way to someone who may have varying levels of experience dealing with the code. This code also is highly factored allowing for aggressive reuse.
    
Especially due to the aggressive reuse the above code is commonly seen in domains. Developers are trained that reuse is good and therefore tend towards designs where reuse is applied. The reuse can be seen two-fold. The first is in the definition of an IRepository<T> interface something like:

interface IRepository<T> {
IList<T> FindAllBy(IQuery<T> query);
void Add(T item);
void Delete(T item);

}

Then people using Object Relation Mappers such as Hibernate will tend to make a generic implementation of this interface since the ORM does most of the heavy lifting for them ex: Repository<T>

Show me the polymorphism!

A main reason why one would favor a generic contract that is possibly specialized such as in the example of IRepository<T> is that one could write code that operated upon IRepository<T> directly to perhaps do things like be a “generic object editor”. That is it uses the various repositories in a polymorphic fashion.

Quite simply, where and more importantly why would someone want to do this? Finding a place or a reason for how this polymorphism would be used is extremely difficult under the guise of domain driven design. Perhaps in some sort of admin interface, but even this would fail to the forms over data complexity test and is likely better off being done in another methodology such as Active Record.

As if the utter lack of necessity of a shared interface were not enough the introduction of such an interface actually causes further issues.

C(?)R(?)U(?)(D(?)
Some objects have different requirements than others. A customer object may not be deleted, a PurchaseOrder cannot be updated, and a ShoppingCart object can only be created. When one is using the generic IRepository<T> interface this obviously causes problems in implementation.

Those implementing the anti-pattern often will implement their full interface then will throw exceptions for the methods that they don’t support. Aside from disagreeing with numerous OO principles this breaks their hope of being able to use their IRepository<T> abstraction effectively unless they also start putting methods on it for whether or not given objects are supported and further implement them.

A common workaround to this issue is to move to more granular interfaces such as ICanDelete<T>, ICanUpdate<T>, ICanCreate<T> etc etc. This while working around many of the problems that have sprung up in terms of OO principles also greatly reduces the amount of code reuse that is being seen as most of the time one will not be able to use the Repository<T> concrete instance any more.

Revisiting the intent.
What exactly was the intent of the repository pattern in the first place? Looking back to [DDD, Evans] one will see that it is to represent a series of objects as if they were a collection in memory so that the domain can be freed of persistence concerns. In other words the goal is to put collection semantics on the objects in persistence.

The key here is that as a rule there should be no persistence logic within the domain. This allows the domain to be more easily tested, tested independently of persistence and moved easily from a persistence mechanism (this is more important as a long term maintainability goal as opposed to “we want to use XML files now”).

Simply put the contract of the Repository represents the contract of the domain to the persistence mechanism for the aggregate root that the repository supports. The realization that the Repository is less of an object and more of a contract to infrastructure is both a subtle and important one.

The importance of the “Contract”
The Repository represents the domain’s contract with a data store (another common word we may use here is that it is the seam). This is extremely important as one can tell every possible way that the domain interacts with the data store by looking at the contract. When it comes time to optimize the database for performance as an example one can look at the repositories and figure out what the domain requires of the data store.

The unfortunately is only useful if the contract is narrow and specific. Consider the conceptual difference between the following.

Repository<T>.FindAllMatching(QueryObject o);
CustomerRepository.FindCustomerByFirstName(string);

In terms of figuring out what the contract to the data store actually is the first example gives us no information. It could literally be running any query that could be expressed within the QueryObject (read: any possible query). In order to now figure out what the contract actually entails one would have to go look through the domain (and possibly the UI (ugh) depending on where the query objects originate).

Simply put: allowing query objects to be passed into the repository widens the contract to a point of uselessness.

But reuse is good?
None of us like writing the same code over and over. However a repository contract as is an architectural seam is the wrong place to widen the contract to make it more generic.

You will note that nothing has precluded the use of Repository<T> only really of IRepository<T>. So the answer here is to still use a generic repository but to use composition instead of inheritance and not expose it to the domain as a contract. Consider:

public interface ICustomerRepository {
    IEnumerable<Customer> GetCustomersWithFirstNameOf(string _Name);
}

In the customer repository composition would be used.

Public class CustomerRepository {
    private Repository<Customer> internalGenericRepository;
    Public IEnumerable<Customer> GetCustomersWithFirstNameOf(string _Name) {
         internalGenericRepository.FetchByQueryObject(new CustomerFirstNameOfQuery(_Name)); //could be hql or whatever
     }
}

The key here is that our seam as exposed to the domain is a very specific architectural seam as opposed to the open/generic seam that allows us to do anything. Composition is used as opposed to inheritance to gain reuse while minimizing the width of the repository contract within the domain.

Hint: Minimize the complexity at the entry/exist seams of your domain by making the seams as explicit as possible.

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

47 Responses to DDD: The Generic Repository

  1. gregyoung says:

    yes lots of different vocabulary for them. Narrowing contracts is the important part.

  2. Alex says:

    I’ve stumbled across this problem before and solved it by narrowing the contract to specific cases – remember, Repositories are not DBALs – they shouldn’t be provided with queries, its repositories themselves which – basing on domain requirements – are constructing the abstract data queries and pass them into DBAL ( what you called the “generic repository” ) here.

  3. gregyoung says:

    I would not use inheritance and use composition instead because if you did that the contract of your repository would include GetByCriteria which I don’t want

  4. Sean Sartell says:

    Greg, not sure if you’re still reading this, but why would you use composition for the generic repository instead using inheritance like the following:

    public class CustomerRepository : GenericRepository, ICustomerRepository { }

    GenericRepository will implement general methods defined in ICustomerRepository without any extra work, but you still get the freedom to explicitly implement any methods in CustomerRepository.

  5. Pingback: Event oriented databases : a new kind of persistence paradigm - Design Matters - ZenModeler's blog

  6. Pingback: planetgeek.ch » What is that all about the repository anti pattern?

  7. Pingback: Interesting links from 2011-01-22 (Kanban) | What is going on under my hat

  8. rusl says:

    thanks for response:

    the question is I saw some implementation whereas IList _repository maintained. say when I am constructing my Repository I am filling it with _repository = LoadAllItems() in ctor. Then in my Find(lambda expression) function I do _repository.Where(lambdaEx).

    my main headache actually comes when I have subcategory as my customers has orders.. so what is the convenient way of loading? my agregate root is customer should I keep a reference to order as an object or just id number. I know there is a way of achieving lazy loading however I just want pretty plain solution. I mean if I want to follow DDD and do everything through Root then my Customer should actually hold all orders as list of objects.. assume that I have thousands of customers? :( thank you in advance

  9. Greg says:

    @rusl:

    I am not sure I understand your question. You mention an internal collection, if your repository is over say SQL server why would you need an internal list?

  10. rusl says:

    hi Greg,

    one question, maybe a bit off topic or naive… I just was wondering should I have to maintain internal collection in my repository. I mean when I have a function like getallcustomers() should I keep in my memory and further refer to this list.. or my function should only return a list and thats it. in that way my repository in loading data acts just like some DAL function. is not it? could you please help me with this… I got tons of other question however this one of essential ones.thanks.

  11. Greg says:

    Besnik,

    What is the actual contact? what do they produce back? How wide is that contract?

    Have you looked at CQRS? Why would you want to do this in your domain, that is pretty obviously a report.

  12. Besnik says:

    Hi Greg,

    I’m trying to solve this problem (and querying of the data in respect to Open-Closed principle) in open source project http://code.google.com/p/genericrepository/.

    The idea is to separate specifications (queries) from the repository interface.

    You would load the domain entity like this:
    IList = customerRepository.Specify()
    .NameStartsWith(“Greg”)
    .OlderThan(18)
    .ToResult()
    .Take(3)
    .ToList();

  13. JohnA says:

    Very nice distinction and explanation — for someone just getting started in DDD.

    I like the more specific v. generic repositories but share Christian’s worry about a very large contract.

    What if we defined a middle ground with constants in the domain repository interface instead of a multitude of methods, ex.
    findStudentsBy(FIRST_NAME)?

    It would keep the contract simple and the implementation in the infrastructure.

  14. kitchai yong says:

    Can’t we just define our query using the Specification pattern and centralized them into the repository assembly. Then have the Application Service assembly pass the specification query to the Find method on the repository? So, that we have less permutation of Find** methods …

    Thanks

    Regards

    kitchai yong

  15. Greg,

    Thanks for writing this. I agree with most of the things you’ve pointed out.

    I do have one question, however, about specifying the contract/seam between a domain model and a data store layer.

    From what I understand, we do not want to have a “generic” contract (like the example you gave) because it becomes too “wide” and renders it useless as a contract. I agree. However, if we have methods such as “findStudentByFirstName”, “findStudentByLastName”, “findStudentByOrganization”, “findStudentBasedOnHairColor”, etc, etc, won’t we end up with a rather large contract? For a sufficiently large domain aggregate, it’s possible to have hundreds of those very specific “find*” methods.

    What would be an elegant solution for something like this?

  16. Greg says:

    first 3 letters of the title DDD

    Certainly most of DDD is overkill for pure CRUD but you shouldn’t be using it for CRUD either

  17. ryzam says:

    But do you think to many layer of encapsulation and the application services function just wrapping what repository have to do is useless for CRUD scenario?

    I dont think we get so much benefit to have custom repository.

  18. Greg says:

    ryzam

    I believe the reasons why have been answered in this post.

    An application service is not a replacement for proper encapsulation of your domain.

  19. ryzam says:

    Hi Greg
    Why dont we let our service that control that behaviour. For example instead we have all custom repository handle the related persistence job, we move to services to control that behavior

    Ex:

    public class LecturerService : ILecturerService
    {
    void CanCreateLecturer(Lecturer lecturer)
    {
    genericRepo.Save(lecturer);
    }
    }

  20. Good post! I have written a similar item on my blog a while ago.

    http://zoomblab.blogspot.com/2009/01/data-access-using-command-pattern.html

    My only “objection” is that I find a bit too much having to keep all the custom repositories only to constrain the amplitude of the generic one. Services do that for me, and clients only have direct access to them but never to the repositories. So instead of a CustomerRepository I would have a CustomerService making the calls to the generic repository.

  21. Scott Muc says:

    Interestingly enough I was coming to the same conclusion as you!

    I kept on looking at my Repository classes thinking about how I could generalize them, but ended up fighting that urge to generalize because I like the fact that my UserRepository had some methods on it specific to User concerns.

    The only difference is that I am using inheritance rather than composition. (eg UserRepository : BaseRepository, IRepository)

  22. Ian Chamberlain says:

    I thought it better to write up my view as a whole and then we can discuss it on Alt.Net.

    http://systemfutures.spaces.live.com/blog/cns!AD5058A4F6569231!242.entry

  23. inexperienced developer says:

    I read alt.net mails too. And I think people are confused when they see Repository word in both CustomerRepository and Repository. What I understand is Repository is a contract between DataMapper and Domain Model while CustomerRepository is a contract between Repository and Presentaion Layer( UI). the Repositories are different in terms of purpose of usage. right?

  24. I recommend everyone that reads this article to make sure they click on “Think Before Coding” to see what he has to add to this pattern.

  25. Greg,

    This was a great write up, I just recently came across the Repository pattern while looking through the S#arp Architecture project and I definitely saw the strengths of it (and the perhaps over zealousness of it in some aspects) but using it in delegation seems like a perfect fit.

  26. This is a manual pingback…
    Repositories and IQueryable the paging case.
    This is an extension of the discution about leting IQueryable flow oustide of repositories or not.

  27. I just wrote a post where I came to a similar conclusion, but using granular repository traits (ICanDelete, ICanSave etc) to build up an interface that exposes a subset of methods from a mainly-generic concrete implementation.

    “A common workaround to this issue is to move to more granular interfaces such as ICanDelete, ICanUpdate, ICanCreate etc etc. This while working around many of the problems that have sprung up in terms of OO principles also greatly reduces the amount of code reuse that is being seen as most of the time one will not be able to use the Repository concrete instance any more.”

    Not true – my concrete ProjectRepository subclasses Repository, so it uses all generic public methods. Common methods we do want like GetById() and Save() pass straight through to the generic base. Common methods we don’t want like Remove() aren’t exposed by IPersonRepository, so the domain doesn’t know about them. Heaps of re-use, and without having to write a whole load of little bridge methods from your CustomerRepository to the Repository inside.

    http://richarddingwall.name/2009/01/19/irepositoryt-one-size-does-not-fit-all/ (last )

  28. Tom Dean says:

    I feel the same way in so much as internal repositories should never be exposed to the client (e.g. the one developing in the domain). However, I still see plenty of merit in an inheritance-based public repository model instead of composition.

    I’m a firm believer in building for the rule and not the exception, and the examples you provided (non-updatable domain objects, the shopping cart) are the exception – it feels unnecessary to change the repository API for a few objects which may not conform to the rule.

    For most data-driven applications, a concrete inheritance-based repository model with an internal generic repository is a perfectly suitable solution. It’s not too advanced for developers unaccustomed to S.O.L.I.D, and it’s flexible enough to support more advanced scenario’s while giving developers their generic juice.

    As an example, lets take a newsfeed style service into consideration. Each story involves 1) an author, and 2) a target domain object on which the story is about. We’ll need the target domain object of each story, and this is incredibly easy to do in a simple loop over any number of domain objects when your public repositories are of Repository.

    Without the support of polymorphism, I might have to rely on reflection, or still revert back to IReadRepository style of inheritance (which always feels like smell to me).

    Also you made a very important point when you said “[make] the seams as explicit as possible”. If you’re going to use query objects, don’t expose them in your contract! Use them the same way as you would an internal repository – the repository exists to say “this is how you can find instances of me”, and providing an arbitrary query mechanism can mean an arbitrary death to your persistent storage (especially if it’s a relational database).

    At the end of the day, either solution is equally workable, and there is no one-size fits all pattern. My experience is generally against entirely mutable domain objects, with a few that are not (.e.g categories).

  29. I completely agree with this. It has always bothered me when people implement generic repositories in this way because you are now shifting potentially database specific concerns back out of the repository. By exposing a query object and allowing them to be built anywhere within the domain it makes it impossible to control (and limit) what kind of queries are formed.

    Your solution through composition is a nice middle ground that would still reduce a huge amount of repetitive code. Excellent post.

  30. Colin Jack says:

    Good stuff, as you know I feel exactly the same. Few comments:

    1) Reuse – Couldn’t agree more, I’ve had massive reuse using concrete repositories for years using exactly this sort of technique (though I consider the composed repository to really be just a persistence helper class, an artifact of the implementation).
    2) Polymorphism – I agree its not necessarily that useful and if you need it you can get it with concrete repositories (IDelete and ISave etc). I found the polymorphism made my persistence tests cleaner, for example SaveTestHelper took an ISave, but as you said on the forums delegates might have worked too (though I find they can lead to less readable test code and had complaints from other devs about my existing use of them in test code).
    3) C(?)R(?)U(?)(D(?) – ICanDelete
    etc, agree you can’t use Repository but you can still get lots of reuse. I think its a valid approach.
    4) The importance of the “Contract” – Nicely put, agree completely. When you say “every possible way that the domain interacts with the data store” its also the way clients of the domain do, which is very useful.
    5) Specifications – I think passing them in can be useful, but only if they are constrained (so not just allowing clients to pass in Specification
    or QueryObject).

    On DDDD, it might be worth backtracking and explaining why we need it and where. Understanding the problems you see it fixing for the average DDD project would be very useful.

  31. Ed McPadden says:

    Great post. I think the point you made about making the seams explicit is a great one. This point is really made clear in this post when talking about repositories. Thanks!!!

  32. “I avoid using sql databases as my transactional store and as such don’t have these issues.”

    Are you saying that when using as relational database, the wheels are turning and that a generic repository is a viable solution in case one can use fetching strategies à la Udi for optimizing load?

    I agree with your options for an RDBMS, but for me, selling an OODB or distributed hashtable is mission impossible.

    Again, great mind provoking post and looking forward to the one(s).

  33. Steve Bohlen says:

    Greg:

    We have long-since used BOTH the Repository implementation to get our reuse AND an ICustomerRepository to get the best of both worlds.

    ICustomerRepository would have the ‘GetCustomerByFirstName(…) method defined on it and all consumers of customers would use just this repository interface only.

    CustomerRepository *dereives* from Repository but *implements* ICustomerRepository. This way, it gets the benefit of the reuse of the generics-enabled repository base-class but since consumers depend only on the ICustomerRepository interface, they aren’t ‘aware’ of the .FindAll(…) etc. methods that are actually there on the class.

    This gives us both the benefit of the reusable generics-enabled Repository base class AND the intention-revealing methodnames hanging off the interface for the consumers of the repository to access.

    Curious your opinion on this approach…?

  34. Greg says:

    yes by readonly I mean in terms of operations you are performing to it (it is updated by some process that keeps it in sync but as far as you are concerned you see it as being readonly).

  35. Matt Gilbert says:

    @Greg, re: “…hitting a read only reporting model…”

    Describing the reporting model as read-only surprised me. As changes are made to the transactional model, in many cases you would want those changes reflected in the reporting model as well correct?

    Can I assume that you mean read-only in the transactional context of DDD/repositories?

  36. Greg says:

    Me thinks another post will be required on that? here are some options..

    Object Database
    Unstructured storage (think bigtable/couch db)
    Event stream persistence (reassemble object based on their histories).

    Greg

  37. Jiho Han says:

    What, if not sql database, do you use as your transactional store?

  38. Nick says:

    “I avoid using sql databases as my transactional store and as such don’t have these issues.”

    I don’t get this, but I expect this is a whole series of blog posts unto itself.

    Oh you silly Habs fans ;)

  39. Shaun says:

    Are your domain repositories ignorant to the persistence/ORM technology? Do they just rely on your infrastructure repositories to handle the details or are the infrastructure repositories just for code re-use?

  40. Greg says:

    Nick I believe the problem you are running into is a function of the impedance mismatch.

    I avoid using sql databases as my transactional store and as such don’t have these issues.

    In general Udi’s solution (I have seen it before) is a pretty good one to the problem though still painful in general.

  41. Greg says:

    Shaun a better question would be why are you running reports in your domain. Note that by report here I am referring to something that sounds like it should be hitting a read only reporting model as opposed to your transactional model.

    Adhoc queries have no place in a domain that is modeling transactional behaviors.

    Greg

  42. Shaun says:

    I curious if your domain repositories are ignorant to the persistence/ORM technology and just rely on your infrastructure repositories to handle the details? If so, how do you effectively abstract ad-hoc queries to your infrastructure repositories?

  43. alberto says:

    “So the answer here is to still use a generic repository but to use composition instead of inheritance and not expose it to the domain as a contract.”

    Hehe, I was thinking of suggesting just that while reading your post.

  44. Nick says:

    This is how I’ve been doing things until now for pretty much the reasons you’ve posted, but have always had trouble figuring out how to optimize queries (which I why I started the thread on alt.net.) E.g. I can have a CustomerRepository.GetCustomerByLastName used in many different use cases. Some use cases are more perfomant if the customer’s orders are eager loaded and others more performant if not. How do I handle this? The options I can think of are:

    1. Udi’s solution of interfaces. (Sorry his site seems to be down so can’t get the link.)
    2. One repo method per use case.
    3. Having a parameter on the repo method indicating the fetch strategy.

    Anyway, still stumped.

  45. Greg says:

    huh?

  46. Bill Barry says:

    Does this mean that the method Find on List is just as bad?

    What exactly is the difference between
    internalGenericRepository.FetchByQueryObject(new someclass(_name))
    and
    List l = getallcustomers();
    var c = l.Find(new someclass(_name).finder)

    ???

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>