What purpose does the Repository Pattern have?

Thursday, April 24, 2008

I have watch Rob Conery’s great screencast about MVC Storefront. If you haven’t seen them, you should take a look. Really interesting, he build and app by using Agile, "TDD" etc. I have some comments about his implementation I want to share, and if you don't agree with me, it's fine, because I'm not an expert, this post is based on my own experience and knowledge ;)

Feel absolutely free to criticize me, but please give suggestions about what things can be done better, and also a reason why. There is no use if you add comments like "I don't agree with you!" if you don't say why.

Something that I don’t like with his implementation so far is his use of the Repository Pattern. I don't know what pattern Rob refer to, I assume it's Fowler's Repository Pattern. This post is based on my experience and interpretation of the Repository pattern, only so you know.

Rob creates a Repository which has a simple interface, he have for example a method GetCategories which returns an IQueryable<Category> object.

public interface ICategoryRepository
{
        IQueryable<Category> GetCategories();
}

He also use the Service layer to implement common used queries such as GetCategories.

public class CategoryService
{
    //...

    public IList<Category> GetCategories()
    {
        return _repository.GetCategories().ToList();
    }
}

Another method that Rob put in to the Service Layer is GetProductsByCategory(int categoryId).

This is the part I don’t like; I will try to explain why and based on my knowledge and experience of the Repository pattern.

The Repository has a responsibility to return entities. What Rod does is returning an IQueryable object, a query nothing else. So his Repository basically don’t return any entities, it’s when he first make a call to the ToList() he execute the query and then get the entities, but it’s the object he returns from the Repository that gives him the entities, not the Repository itself. For me the Repository he use is sort of meaningless; it only works as a query provider not as a Repository regarding to the definition of the Repository Pattern.

“The Repository will delegate to the appropriate infrastructure services to get the job done. Encapsulating in the mechanisms of storage, retrieval and query is the most basic feature of a Repository implementation”

“With a Repository, client code constructs the criteria and then passes them to the Repository, asking it to select those of its objects that match. From the client code’s perspective, there’s no notion of query “execution”; rather there’s the selection of appropriate object through the “satisfaction” of the query’s specification.”

“Most common queries should also be hard coded to the Repositories as methods.”

Source: PoEAA [Fowler] and DDD [Evans]

The interface of a Repository I should have used if I should slavery follow the definition of the Repository pattern is:

public interface IProductRepository
{
    //...

    public IList<Product> GetProductsByCategory(int categoryID);

    public IList<Products> GetProducts();
}

What infrastructure service the Repository should use is something I will decide later in my project (This is also something Rob mention in his screencasts), first I will make sure my Domain Model is correct, then I decide based on my model what infrastructure service I should use to persist my model. The way to persist my model is something I probably never going to change, and to follow YAGNI which is a important part when working Agile, I shouldn't care or write code which make is possible to easy replace the infrastructure service a Repository use because I think it may change or I may need it later. But if I decide to use LINQ to SQL, the implementation of my Repository will probably look like this:

public class ProductRepository : IProductRepository
{
    public IList<Product> GetProductsByCategory(int categoryID)
    {
        using (MyDataContext dataContext = new MyDataContext())
        {
            return (from p in _dataContext.Products
                    where p.CategoryId == categoryID
                    select p).ToList();
        }
    }
}

Something to observe here is that I dispose the DataContext after I execute my Query. By doing that I will lose the track changing of my entities, my Unit of Work and also my Identity map which is handle by the Context. But when I write this code using Agile, I don't need it at the moment.

My Mock object for the ProductRepository would probably look like this:

public class MockProductRepository : IProductRepository
{
       public IList<Product> GetProducts()
       {
           var products = new List<Product>();

           for (int i = 0; i < 10; i++)
               products.Add(new Product(.....));

           return products;
       }


       public IList<Product> GetProductsByCategory(int categoryID)
       {
           var products = this.GetProducts();

           return (from p in products
                   where p.CategoryID == categoryID
                   select p).Single();
       }

       //...
}

If we return an IQueryable<> instead of a IList from our Repository and decide to use LINQ To SQL, we will need to have something in mind. Correct me if I'm wrong, but the GetTable method used by LINQ to SQL will return a Table<> object, which implements the IQueryable<> interface. The Table<> object will hold a reference to the DataContext. So a call to the ToList() method of the IQueryable<> require the context. So the following code will not work:

public class ProductRepository : IProductRepository
{
    public IQueryable<Product> GetProductsByCategory(int categoryID)
    {
        using (MyDataContext dataContext = new MyDataContext())
        {
            return dataContext.Products;
        }
    }
}

The implementation of the Repository can for example look like this to make it work:

public class ProductRepository : IProductRepository
{
     private MyDataContext dataContext = new MyDataContext();
     
     public IQueryable<Product> GetProductsByCategory(int categoryID)
     {
         return dataContext.Products;
     }
}

What does this code really do? Well it only expose Table<> objects at the moment and serve more like a query provider than as a Repository.

If we still keep this implementation we need to make sure the DataContext get disposed, so we don't add to much unnecessary entities to the Identity map etc, right!? This is something the Service layer need to do if we use the solution Rob uses in his project. If we don't want to reuse the same context for all method in our Repository we can use the following implementation:

public class ProductRepository : IProductRepository
{
    public IQueryable<Product> GetProductsByCategory(int categoryID)
    {
        MyDataContext dataContext = new MyDataContext();

        return dataContext.Products;
    }
}

The problem here is that each call to the Repository's methods, will create an instance of the DataContext which will be added to the memory, each will have it's on Identity Map etc, those features can probably be turned of so it will not be unnecessary copies of entities in the memory. But still I assume we need to have more things in concern when returning a IQueryable<>, maybe not at an early stage but later. Most of those can be avoided by not returning the IQuerable<>.

One last thing that I'm not a fan about is the following code:

return _repository.GetProducts.WithId(10).ToList();

It will break the Law of Demeter. Instead if we do a call like this:

return _repository.GetProductsWithId(10);

We will not break the law, and this is the kind of method a Repository should have, if the query of a products is a common query we need to use.

No one says that the Repository pattern Fowler and Evans talk about is a Silver bullet, and Rob has a good point when he told me:

“One thing I’ll suggest is that with a new feature set (.NET 3.5) comes some new ways of doing things.”

Even within the computer world, there are evolutions and we shouldn’t be afraid of changes and test new way to solve things.

It will be interesting to see how the final version of Rob's application will look like, maybe I will change my mind or his implementation will change ;)

28 Comments

I have to say I quite agree with Rob: many of the design patterns need to be revised when something game changing such as Linq comes along. Linq frees the query language from the store's implementation and brings querying back from the depths of the DAL to the domain. And the way it's done (by deferring execution until we know as much as possible about the query and abstracting the low-level query), it's a good thing. The domain had to implement quasi-querying methods anyway every time, which we can now do away with.
You do raise good points about the data context, but maybe there's a way to solve those problems without throwing away the wonderful freedom that Linq gives here.

Bertrand Le Roy - Thursday, April 24, 2008 4:37:32 PM

Is some of this more an issue with how Linq to SQL was architected vs. the Repository pattern?

(PS. I think the Repository pattern is Eric Evans - DDD concept is it not?)

Steve - Thursday, April 24, 2008 4:55:57 PM

I’m too much colored by Domain Driven Design and the components used in DDD and have that thoughts in my mind. I’m also far away to even call myself an expert when it comes to DDD, I wonder how many people can say "I’m an DDD expert" ;)
Rob doesn’t claim that he even use Domain Driven Design, only the Repository Pattern and a service. But from a Domain Driven perspective (my interpretation), I will call his Service (current implementation of the service) for Repository, it fit so well in the description about what a Repository is and should work. What he calls a Repository, is for me a cool version of some kind of a pluggable provider and is created to make it easy to replace the infrastructure services his Service will use to get entities etc.  A Service for me should help with coordination’s etc. For example move money from one account to another one, that implementation should be in a Service. The Service can use the Repository to get the correct entity, and also update it and remove it etc. A Service can also be part of different layers, for example Application, Domain and the Infrastructure layer.
What I don’t like is that the word Repository Pattern is used in the context, when it only returns Query objects and doesn’t serve as a “repository” of entities. To change the Repository Pattern to be something different from what Martin Fowler and Evans describe it, is not fair. It’s like changing another pattern to be something else and copy the name, it will only confuse people. I would in this case give it a new name. I think Rob has invited a new and assume pattern when using LINQ, which shouldn’t be mixed with the name Repository Pattern. But that is what I think.

Fredrik Norm&#233;n - Thursday, April 24, 2008 5:35:12 PM

>"Linq frees the query language from the store's implementation and brings querying back from the >depths of the DAL to the domain. And the way it's done (by deferring execution until we know as much >as possible about the query and abstracting the low-level query), it's a good thing."
Part from the deferring execution, this is what I have done when using Domain Driven Design. But I define a Query by using a specification object, so I still define the query within the domain model, not in the data access layer (which doesn't really exist when using DDD). But to use separation of concerns, the query is passed to the Repository and it will handle the execution and delivery of entities.But I can agree, with LINQ there comes new way to do stuff.. we got a new player in the field!

Fredrik N - Thursday, April 24, 2008 8:59:04 PM

I have no problem with Rob's implementation of repository. When using Linq-to-sql it makes sense to return queries instead of collections because it allows some powerful chaining options later (for paging and the like). The only difference between a query and an instantiated collection is that the query simply has not been executed yet.

Liam McLennan - Thursday, April 24, 2008 9:05:14 PM

Liam Mcennan:
It’s wonderful implementation, by all means. Maybe Rob’s Repository will end up to be an anti pattern, who knows. But by definition a Repository will do execution of the query and return entities, not query objects. Take a look at the following implementation:
Public interface ICategoryRepository{       IQueryable<Category> GetCategories();}
public class CategoryService{
   //...
   public IList<Category> GetCategories()   {       return _repository.GetCategories().ToList();   }}
If it instead should look like this:
public interface ICategoryQueryProvider{   IQueryable<Category> GetCategories();}
pubic class CategoryRepository : ICategoryRepository{    private ICategoryQueryProvider _categoryQueryProvider;
    //..
    public IList<Category> GetCategories()    {         return _categoryQueryProvider.GetCategories().ToList();    }}
Then the CategoryRepository by definition is a Repository.

Fredrik N - Thursday, April 24, 2008 10:13:24 PM

You raise a valid question. Why can't the repository do the listing of entities on its own and not reley on the Service. I feel the solution Rob has done is a nice one. What's the difference between IList and IQueryable is the question her. You can do almost precisely the same with both, the difference is more that IQueryable is a Proxy for IList so you get your Entities when needed. I don't feel that this violates DDD terms. I'm not the expert here but I feel that it's only syntatic sugar.

Your last comment gives a nice solution to the problem though and is maybe a better way to do it.

BennyXNO - Friday, April 25, 2008 1:04:24 PM

I switched back to NHibernate after using Linq to Sql.
NHibernate integration with Spring.NET is a perfect fit. I'd rather have Linq over NHibernate vs. Linq over Sql.

Steve - Saturday, April 26, 2008 2:46:16 AM

I agree with you Fredrik. In my opinion Rob's implementation has mingled layers and as I see it, it would cause trouble in an distributed environment. What if the consumer of his returned repository-query-objects resides on a different machine? What will happen when that consumer tries to query the resultset - I guess the DataContext will have some trouble accessing its source (or am I wrong here???)

wwfDev - Sunday, April 27, 2008 8:51:21 AM

Rob's repository makes more use of LINQ. Fredrik's is more classic and layering clear. Hope some giant can figure out a better Repository

Derek - Wednesday, June 11, 2008 4:49:29 PM

"change the Repository Pattern to be something different from what Martin Fowler and Evans describe it, is not fair."

I agree with all you've started and this statement in particular.

"Part from the deferring execution, this is what I have done when using Domain Driven Design. But I define a Query by using a specification object, so I still define the query within the domain model, not in the data access layer (which doesn't really exist when using DDD). But to use separation of concerns, the query is passed to the Repository and it will handle the execution and delivery of entities."

Amen to that, and if you want to chain them together chain the specifications (And/Or/Not) before passing them in (covered in DDD).

Colin Jack - Wednesday, July 2, 2008 12:36:35 PM

I very much like the CategoryRepository : ICategoryRepository implementation. CategoryRepository will also do things like add, update, delete. Now the client (a web page with a data grid) can grab the ICategoryQueryProvider from the service layer and do its paging, sorting on the client. I like a having a presentation layer with something like a CategoryPresenter class which can be used as ObjectDataSource. That class will expose Count(), GetCategories(int maxrows, int startIndex) and GetCategories(int maxrows, int startIndex, string sortBy) methods. This CategoryPresenter will grab the ICategoryQueryProvider in its ctor and the methods will make use of it.

Altug - Wednesday, July 30, 2008 11:28:03 PM

@Bertrand Le Roy: Agree with Fredrik here. I see you point regarding the freedom the Linq language extension gives us however it means that your service will be bound to use Linq to get entities from the repository hence will have to know too much about the repository reducing maintainability if or rather when future implementations of the interface change in such a way that it no longer uses Linq to do the job.

Returning an IList on the other hand leaves the CRUD responsibilities where it belongs, to the implementation of the repository and if in the future a language extension named “UFO” would occur that does the job far more efficiently than Linq all you need to do it is to change the repository and all your other systems would remain intact and function like nothing happened.

Christian Schiffer - Thursday, October 23, 2008 7:28:43 AM

Nice article. I agree with your thoughts that the repository should handle the intricacies of building the result set but return entities. I have a small suggestion.  I personally liked the quote that you had on your blog from Martin Fowler, which is as follows "Any fool can write code that a computer can understand. Good programmers write code that humans can understand - Fowler".
Following the same quote, i would like to say that many developers and blog writers tend to use shortforms while demonstrating code snippets, especially the ones with Lambda expressions. Take for example
return (from p in _dataContext.Products
                   where p.CategoryId == categoryID
                   select p).ToList();
It would be lot more readable and helpful to understand as well if we refactor the code something like
return (from product in _dataContext.Products
                   where product.CategoryId == categoryID
                   select product).ToList();

vn_nilesh@hotmail.com - Monday, September 7, 2009 2:15:42 PM

I agree with your thoughts Fredrik, though I would like to see some proper examples on how to combine the Repository Pattern with LINQ. It's somewhat essential that the repository returns real object instances and not queries. Want to avoid too much obvious duplication of methods, such as "GetCustomerByName" and "GetCustomerById" by enabling LINQ in the picture.

SondreB - Monday, October 5, 2009 2:35:01 PM

Hey!
very interesting
Using the repository pattern how can you switch the data access from native ADO to Entity to nHibernate. I presume a seperate implementation for each but some sort of Query objects ??

peter - Wednesday, November 4, 2009 11:46:22 PM

@Peter:
If you don't want the Repository to be dependent on a specific data access infrastrucutre, you can of course use providers:
Repisitory -> data access provider -> database

Fredrik N - Wednesday, November 4, 2009 11:54:08 PM

@Peter:
That is one reason why I think POCO is so important. Most ORM's support POCOs, like nHib and Entity Framework 4.0. If you use pure ADO.NET, get a reader back, fill your entities (This is what an ORM will help you with).

Fredrik N - Thursday, November 5, 2009 7:40:40 AM

Reading the article (great read and perspective) I believe you summed it up with one of your last comments.

Repisitory -> data access provider -> database [or datasource]

I love the idea of being able to return a query for Linq to work its magic, but when sticking to DDD methodologies, it's not "pretty" and breaks a lot of the rules. I believe the data access provider is exactly where the "repository pattern" mentioned all throughout this article should reside. And given the initial definition of a true repository pattern, it makes perfect sense to rename it or shift its implementation.

And "Repisitory -> data access provider -> database [or datasource]" seems to be the exact formula I've been working on. It's always nice and encouraging to see I'm not alone in this world ;-)

-D

DANewell - Tuesday, January 19, 2010 11:08:52 PM

@DANewell:
Thanks!
We got a new player and that is LINQ.. so lately I do almost have a method in my Repository called GetQuery. This method returns an IQueryable<TEntity>. BUT! I never let my GetAll methods etc return the IQueryable<T>, they returns IEnumerable<T>. The reason why I have the GetQuery, is to enable the use of LINQ if I must, and LINQ can and will in some cases help a lot.

Fredrik N - Wednesday, January 20, 2010 6:50:05 AM

Linq is my friend ;-)

One thing I've been wanting to run by some design pattern experts is the following:

"Sometimes, I don't want to return all the fields of an object, I only want FirstName and LastName from "Customer." Customer may have 30 fields (just using sample numbers) and the repository pattern means I have to return the entire row for each customer object. Why can't I generate the query, then run a "Select(p => new { FirstName = p.FirstName, LastName = p.LastName })" on the returned query in my presentation layer?"

I found that doing this in practice speeds up my web apps considerably.

var query = myRepository.GetQuery();
var customerNames = query.Select(p => new { FirstName = p.FirstName, LastName = p.LastName });

this.someControl.DataSource = customerNames;
this.someControl.DataBind();

This way, the query only returns a small amount of data and the control doesn't have to reflect over the entire object.

Does this make sense? Have you (or anyone else) tried this? Is this bad practice? What is the sound of one hand clapping?

Any input on this is appreciated!
-D

DANewell - Friday, January 22, 2010 6:46:43 PM

@DANeWell:
It depends ;) I have added a GetQuery to my latest Repository as a way to define queries.

Fredrik N - Friday, January 22, 2010 6:53:50 PM

public class MockProductRepository : IProductRepository
{
public IList GetProducts()
{
var products = new List();

for (int i = 0; i < 10; i++)
products.Add(new Product(.....));

return products;
}

public IList GetProductsByCategory(int categoryID)
{
var products = this.GetProducts();

return (from p in products
where p.CategoryID == categoryID
select p).Single();
}

//...
}

In this part there something i can't get it which is in this case you will load all the products then start filtering it , i am wondering if i have million record in the product and the result i am expecting after the filter will be 10 products so why i should load all this to may my filter ??? can you give explanation?

Amgad Fahmi - Tuesday, February 23, 2010 11:44:45 AM

@Amgad Fahmi:
You shouldn't, just make sure you only get the 10 products. It's up to you to create the filter.

Fredrik N - Tuesday, February 23, 2010 4:28:59 PM

I agree with Frederik here.
As far as I know, using the pattern as is (using IQueryable interface), in this case, you 'd be restricted using linq in other parts of the app as well. If your customer decides to change to NHibernate for example, you run stuck because of this kind of implementation, the linq to sql way.
This would be ommited using Frederik's implementation.

Johan B - Saturday, April 3, 2010 4:49:24 PM

Dot Net supports deferred execution. The object might not be interested in the collection all the time and may even query the result too for extracting a part of the result. In such a case especially chained options, it is better to return a query rather than a collection as it supports deferred execution. Moreover we can create a instance of context in the constructor rather than creating and disposing it after the usage.

I believe that dependency injection pattern in conjunction with repository pattern is more helpful as it will ease our test automation too.

Santosh Arisetty - Tuesday, April 13, 2010 9:51:50 AM

Man, your grammar is terrible, some sentences are impossible to decipher.

Slapshot - Friday, June 11, 2010 1:39:39 PM

everytime you create an instance of DataContext you should dispose it right away, like this:
using(MyDataContext context = new MyDataContext){
...
}
and you won't have any problems with memory overflow

sanjar - Tuesday, September 7, 2010 8:36:16 PM

Comments have been disabled for this content.