Returning IQueryable from Repository Pattern

Saturday, October 15, 2016

In this post I am going to make the case why returning IQueryable from a Repository Pattern is an anti-pattern and why you should not do it.

I assume that you have some experience on building applications using the Layered Architecture and are familiar with various design patterns like Unit of Work and Repository pattern.

You can have a look at this post if you want to find out more about the Repository Pattern.

In this post I am talking the Repository pattern and Entity Framework and explain why we still need to use the Repository Pattern even if we use EF in our persistence layer.

One thing we should keep in mind is that at some point our methods in the Persistence layer should return something that can be consumed by the client classes in upper layers.

Clean code architecture dictates that we should program against clean interfaces and the various layers that exist in our application should communicate with each other trough interfaces so there is separation of concerns and the one layer is decoupled from the other. In other words we need to have low coupling between the layers.

The layers in an enterprise ASP.Net Core (ASP.Net 5 MVC 6.0) application should be the following:

1) User Interface layer (View models, View Components, Views, partial Views)

2) The domain layer which models the problem at hand, a series of classes like Customer, Order, e.t.c.

3) The Application or Services Layer which basically orchestrates all the classes in the lower levels (Domain layer, Persistence layer) and passes the information to the UI layer.

4) The Persistence layer where we have our ORM entities and the repository pattern which is responsible to simplify data access and enforce separation of concerns.

In the simplest words the repository pattern provides a clear separation between the application and the database access.

There will be instances of the Persistence layer classes in the Application/Services layer and there will be instances of the Application/Services in the UI layer, usually in the constructor of the controller.

Usually these classes are injected through the Dependency Injection pattern in the upper layers. This is of course something I will write in another post.

So now let's look into why we should not return IQueryable from a Repository class method.

By visiting this link from MSDN we can have an excellent explanation of what IQueryable<T> is.

public interface IQueryable<out T> : IEnumerable<T>, IEnumerable, 
	IQueryable

First of all it is an interface that inherits from IEnumerble<T> and IEnumerable.

Basically the IQueryable<T> interface is to be implemented by query providers. The query provider is a low level mechanism.

LINQ to Entities queries are translated into expression trees and then in raw SQL queries by the query provider. So you have a base query provider and on top you have things like the LINQ query provider.

So that is the purpose of the query provider. A query provider is not to be used to build a query object to execute and retrieve results from Linq to Entities.

I a nutshell IQueryable<T> allow us to operate and build queries that can be sent to a query provider, e.g LINQ to Entities query provider or Linq2SQL provider which in their turn will translate this to SQL and send it to the connected database.

IEnumerable<T> allow us to execute queries in memory against the objects, models, entities that the repository methods return.

Let me demonstrate with some code because a lot of people are still confused.

I will begin with a simple demonstration of IQueryable<T>.

Repository Layer

public IQueryable<Doctor> GetDoctors()

{

return context.Doctors;

}

Services Layer

//we assume the repository class(myrepo) is injected in the Services Layer

public void GetDoctorswithMostAppointments()

{

var getDoctorswith50Appointments = myrepo.GetDoctors().Where(d=>d.Appointments.Count >50);

var onlyAfewDoctors = getDoctorswith50Appointments.Skip(5).Take(10);

var returnedDoctors = onlyAfewDoctors.ToList();

}

the database is called in the line below

var returnedDoctors = onlyAfewDoctors.ToList();

Only the data that we want comes back from the database.

Now let me rewrite the code above using the IEnumerable<T>.

Repository Layer

public IEnumerable<Doctor> GetDoctors()

{

return context.Doctors.Include("Appointments");

}

Eager loading (by using Include) is the process whereby a query for one type of entity also loads related entities as part of the query.

Services Layer

//we assume the repository class(myrepo) is injected in the Services Layer

public void GetDoctorswithMostAppointments()

{

var getDoctorswith50Appointments = myrepo.GetDoctors().Where(d=>d.Appointments.Count >50);

var onlyAfewDoctors = getDoctorswith50Appointments.Skip(5).Take(10);

//actual call to the database

var returnedDoctors = onlyAfewDoctors.ToList();

}

All the operations above are done in the memory.

Let's say that you write something like that in your Services layer which sits on the top of the domain layer.

var BestSellerBooks = context.Books.Where(b=>b.isPublished && b.AuthorLastName="Hemingway");

Let's examine what we get back here. Without a shadow of a doubt we get back an IQueryable<Book> which is an entity from the entity framework, meaning we use the Entity Framework inside our Applications or Services layer.

That is a leaky abstraction and a violation of separation of concerns principle. We have the Services layer tightly coupled to the underlying ORM which is EF in our case.

This is clearly wrong since the Services layer is no more technology agnostic.

What we should pass to the other layers from the persistence and repository classes and methods is a business object, a model. By returning IQueryable you do not get back a business entity but you get back a query builder which specifies how to get the data back. Repository should be about what to get and not how to get it.

This is the job of the repository, persistence layer of how to get it. What we want back in order to pass it to the layers above is the model, the data, the business entity.

A very important stage of the development process is testing. In large ASP.Net applications we use Unit testing to test the domain and persistence logic of the application.

We cannot test data persistence or domain logic of repository classes and methods that return IQueryable.

In this post we argued why we should return IEnumerable<T> from our repository classes and methods and not IQueryable<T> because if we do we couple tightly the various layers of our application, break separation of concerns priniciple and make the application impossible to test.

Hope its helps!!!

5 Comments

public IEnumerable<Doctor> GetDoctors()

{

return context.Doctors.Include("Appointments");

}

The database call took place in the line above. All doctors and their appointments have been already loaded to the memory.

---
This is incorrect. The database call *does not* happen in the line above. It only happens when the IEnumerable<T>/IQueryable<T> is actually enumerated (e.g. via ToList() or foreach).

Rex - Sunday, October 16, 2016 11:13:04 AM

Good points.

There is one more big problem in returning IQuerable. The problem is not in EF itself as pointed in the article, we can make our own ICustomQuerable<TBusinessModel> interface, which will be a wrapper on top of EF.

The underlying problem is that for many years nobody can provide similar implementation of this interface, and, indeed, it is a very hard task.

* EF has its own partial implementation, where it raises exception if cannot produce SQL (e.g. old EF 4.1 http://stackoverflow.com/questions/6414058/why-does-ef-4-1-not-support-complex-queries-as-well-as-linq-to-sql).
* OData has its own, again, limited implementation
* It also can vary for MySQL, MSSQL, Postgres and so on

If tomorrow part of the data will be moved to SAP services, NoSQL data base, or simply to text files, can we guaranty proper implementation of IQueryable interface? All the possible Joins, Filters? Very doubtful.

But with 'IEnumerable<Doctor> GetDoctors()', as pointed in the article, the implementation becomes much more simple.

Artur - Sunday, October 16, 2016 2:50:19 PM

Your statement "The database call took place in the line above. All doctors and their appointments have been already loaded to the memory." is actually incorrect. Try removing the .ToList(), in your second code example, and you'll see that the returnedDoctors variable is actually an IQueryable<Doctor>. You would have to call .ToList() in the GetDoctors method, for it to not return an IQueryable<Doctor>, however, that means, as you have said, ALL doctors will be loaded into memory. What if you have tens of thousands of doctors? In that case you're retreiving and instantiating tens of thousands of Doctor objects when you only want 10. I encounter this very problem at an old job. A report page was running very, very slowly (returning hundreds of thousands of records) and all it took to speed up the report was to move the .ToList() to a later part in the code.

SB - Friday, October 21, 2016 8:28:08 PM

Hello Rex,

yes you are right.

thanks for your input.

nikolaosk - Sunday, December 11, 2016 8:26:46 PM

thank SB,

you can see your comment now and i changed my code.

nikolaosk - Sunday, December 11, 2016 8:33:02 PM

Comments have been disabled for this content.