What specific issue does the repository pattern solve?

Dave picture Dave · Nov 1, 2012 · Viewed 9.9k times · Source

(Note: My question has very similar concerns as the person who asked this question three months ago, but it was never answered.)

I recently started working with MVC3 + Entity Framework and I keep reading that the best practice is to use the repository pattern to centralize access to the DAL. This is also accompanied with explanations that you want to keep the DAL separate from the domain and especially the view layer. But in the examples I've seen the repository is (or appears to be) simply returning DAL entities, i.e. in my case the repository would return EF entities.

So my question is, what good is the repository if it only returns DAL entities? Doesn't this add a layer of complexity that doesn't eliminate the problem of passing DAL entities around between layers? If the repository pattern creates a "single point of entry into the DAL", how is that different from the context object? If the repository provides a mechanism to retrieve and persist DAL objects, how is that different from the context object?

Also, I read in at least one place that the Unit of Work pattern centralizes repository access in order to manage the data context object(s), but I don't grok why this is important either.

I'm 98.8% sure I'm missing something here, but from my readings I didn't see it. Of course I may just not be reading the right sources... :\

Answer

Eric King picture Eric King · Nov 2, 2012

I think the term "repository" is commonly thought of in the way the "repository pattern" is described by the book Patterns of Enterprise Application Architecture by Martin Fowler.

A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects, and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes.

On the surface, Entity Framework accomplishes all of this, and can be used as a simple form of a repository. However, there can be more to a repository than simply a data layer abstraction.

According to the book Domain Driven Design by Eric Evans, a repository has these advantages:

  • They present clients with a simple model for obtaining persistence objects and managing their life cycle
  • They decouple application and domain design from persistence technology, multiple database strategies, or even multiple data sources
  • They communicate design decisions about object access
  • They allow easy substitution of a dummy implementation, for unit testing (typically using an in-memory collection).

The first point roughly equates to the paragraph above, and it's easy to see that Entity Framework itself easily accomplishes it.

Some would argue that EF accomplishes the second point as well. But commonly EF is used simply to turn each database table into an EF entity, and pass it through to UI. It may be abstracting the mechanism of data access, but it's hardly abstracting away the relational data structure behind the scenes.

In simpler applications that mostly data oriented, this might not seem to be an important point. But as the applications' domain rules / business logic become more complex, you may want to be more object oriented. It's not uncommon that the relational structure of the data contains idiosyncrasies that aren't important to the business domain, but are side-effects of the data storage. In such cases, it's not enough to abstract the persistence mechanism but also the nature of the data structure itself. EF alone generally won't help you do that, but a repository layer will.

As for the third advantage, EF will do nothing (from a DDD perspective) to help. Typically DDD uses the repository not just to abstract the mechanism of data persistence, but also to provide constraints around how certain data can be accessed:

We also need no query access for persistent objects that are more convenient to find by traversal. For example, the address of a person could be requested from the Person object. And most important, any object internal to an AGGREGATE is prohibited from access except by traversal from the root.

In other words, you would not have an 'AddressRepository' just because you have an Address table in your database. If your design chooses to manage how the Address objects are accessed in this way, the PersonRepository is where you would define and enforce the design choice.

Also, a DDD repository would typically be where certain business concepts relating to sets of domain data are encapsulated. An OrderRepository may have a method called OutstandingOrdersForAccount which returns a specific subset of Orders. Or a Customer repository may contain a PreferredCustomerByPostalCode method.

Entity Framework's DataContext classes don't lend themselves well to such functionality without the added repository abstraction layer. They do work well for what DDD calls Specifications, which can be simple boolean expressions sent in to a simple method that will evaluate the data against the expression and return a match.

As for the fourth advantage, while I'm sure there are certain strategies that might let one substitute for the datacontext, wrapping it in a repository makes it dead simple.

Regarding 'Unit of Work', here's what the DDD book has to say:

Leave transaction control to the client. Although the REPOSITORY will insert into and delete from the database, it will ordinarily not commit anything. It is tempting to commit after saving, for example, but the client presumably has the context to correctly initiate and commit units of work. Transaction management will be simpler if the REPOSITORY keeps its hands off.