DAOs and Repositories
One of the concerns we want to separate our domain from, is how we persist the domain model. The domain should not need to where: file, database, service etc. or how. In the specific context of an RDBMS, because we do not want the domain to be relational database or its schema we do not want any SQL statements or ADO.NET objects within our domain. It is not just that the external schema may change but also that there may a mismatch when mapping between the two, the impedance mismatch problem.
DAO
A Data Access Object (DAO) is a pattern for encapsulating data access. A DAO is an abstraction that provides services for retrieving, inserting, updating, and deleting objects from a persistent store. Fowler calls a DAO a Data Mapper. We will use this term from now on, because the term DAO means different things to different people.

Because the Data Mapper knows the details of RDBMS, how to connect to a DB, relational schemas, SQL etc. it cannot live in the domain, instead it is an infrastructure service. Because we do not want our domain to depend on concrete classes, we need to provide an interface for the Data Mapper, so that we can depend on an abstraction from the domain. We want to be insulated from change in either the persistence mechanism or medium.
Test-first development will drive out this abstraction, because we want to replace the dependency on the DB at runtime (to both prevent issues with shared fixtures and slow tests).The domain, or an application service, uses the Data Mapper for persistence.
Most hand-rolled or code-generation Data Mappers are the simplest form using hard-coding, configuration files, or reflection to map between a class in the domain and a table in the RDBMS. Within .NET, a Data Mapper often contains ADO.NET code, using dynamic SQL or stored procedures. Internally a Data Mapper tends to use a DataReader for performance. Of course the Data Mapper does not have to be abstracting a DB, and we might be using XMLDocument etc. under the hood.
The biggest advantages to the Data Mapper pattern are
- Simplicity: it leaves us with a clean domain model where the orthogonal concern of persistence is separated out of from the domain classes.
- Extensibility: By depending on an abstraction, instead of a concrete type, we can swap implementations (through a factory or IoC container) which allows us the flexibility to meet new needs, such as support for multiple vendors SQL implementations.
- Maintainability: Having all of the data access code for a class in one place, reduces duplication and shotgun surgery when we need to update our data access code.
There are a number of complimentary patterns we can add when we use a Data Mapper:
- Identity Map: When we load an entity we want future loads of that same entity to return the same reference. This is not for performance, a happy side effect, but ensures that we don’t lose any pending changes next time we reload, and allows us to compare for equality by address.
- Unit of Work: Records the deltas between an object when loaded and when persisted so that we only update with changes not everything. The unit of work also allows us to bundle up changes and make them together, so as to improve performance.
- Query Object: We don’t want to pollute our domain space with SQL when we want to perform dynamic queries so we need an alternative way of representing a query.Instead of rolling-our-own for a Data Mapper we can adopt the expedient of buying an off-the-shelf product in the RDBMS space called an Object Relational Mapping Tool.
Repository
If we have a hand rolled Data Mapper, or ORM, although we can make calls to it from the domain, we still want to encapsulate that interaction.
There are a number of reasons for this but the most important are:
- DRY: We don’t want repeated code, setting up a Query object to do a common query for example, so by encapsulating into a single method we prevent that.
- Shotgun Surgery: We don’t want to have to modify our calls to the Data Mapper in many places, so by adding the code to a single class we don’t have to go for searching it out within our application.
- Gateway: We might want to change our ORM at some point in the future. Encapsulating the calls to the ORM allows us to limit the modifications we have to make to the classes that are implemented using the ORM.
The Repository pattern looks and feels like a collection to the domain. This makes it clear and simple to work with. The unit of work often remains outside of the repository, because we may want to update multiple elements on a repository or multiple repositories at once. The repository is implemented by calls to the Data Mapper. Building a repository with LINQ is almost trivial:
public class NorthwindRepository
{
private IQueryable<Customer> customers;
public NorthwindRepository(DataContext context)
{
customers = context.GetTable<Customer>();
}
public Customer FindCustomer(string customerId)
{
return (from c in customers
where c.CustomerID == customerId
select c).Single<Customer>();
}
}
There are two alternatives to testability with a repository: the simplest is to have the repository implement an abstraction (i.e. an interface), which you then provide an in-memory stub for, for testing purposes. The other is to provide an abstraction for the Data Mapper that your repository depends upon and provide a test stub for that. The advantage here is if you want to confirm querying logic within your repository.
public class CustomerRepository
{
private IUnitofWork workspace;
public CustomerRepository(IUnitofWork workspace)
{
this.workspace = workspace;
}
public Customer FindCustomer(string customerId)
{
return (from c in workspace.Customers
where c.CustomerID == customerId
select c).Single<Customer>();
}
}
Jimmy Nilsson shows an example of this in his book: Applying Domain Driven-Design and Patterns and I talk more about testability in Being Ignorant with LINQ to SQL.
Roles in LINQ to SQL
LINQ to SQL provides an ORM tool for use with .NET. Components within LINQ to SQL map to ORM patterns: LINQ to SQL’s DataContext contains an Identity Map that holds objects already loaded from the database. It also acts as a unit of work: you call the DataContext to submit your changes and it figures out the SQL statements necessary to figure out what has changed from the version last loaded. Note that LINQ to SQL optimizes queries that are against the primary key, by returning directly from the map rather than re-querying. For anything else, it cannot know what the returned set will be in advance, so it has to return themMy first piece of advice around LINQ to SQL, based on existing ORM practice, would be: LINQ to SQL is an ORM and may be safely called from within the domain layer. Calls to LINQ to SQL do not need to be placed in the infrastructure layer (sometimes called data access layer in this role). LINQ to SQL fulfils the role of a Data Mapper within the infrastructure layer already, so this would just be a repetition of the abstraction. However, do consider wrapping interaction with the LINQ to SQL ORM within a Repository, instead of using it throughout the domain to simplify testing and maintenance.
Next time we'll talk about dynamic queries.
Posted
Sun, Dec 2 2007 11:31 PM
by
Ian Cooper