Entity Framework Performance optimization patterns-part II
This is the second post in a series of posts where I talk about good coding practices when it comes to using Entity Framework as our data access layer when building our applications.
You can read the first post of the series here. The main things to take away from that first post is to use projection whenever possible (do not query sql server for data that you do not want on the client) and to filter on the server (meaning SQL Server) which has a powerful engine to do that and not on the client.
In this post I am going to provide you with a hands-on example on how to avoid writing your LINQ to Entities queries in a way that will hinder performance of your application. The aim of this post (hopefully a series of posts on performance and Entity Framework) is to highlight bad coding practices when architecting an applications that uses EF as the ORM to fetch and manipulate data from the data store. I am going to point out some practises and patterns that very often developers use and cause EF to create poor-performing T-SQL statements.
First, a quick word on what Entity Framework is. Entity Framework is an object-relational mapping (ORM) framework for the .NET Framework.EF addresses the problem of Object-relational impedance mismatch. I will not be talking about that mismatch because it is well documented in many sites on the Internet.
Through that framework we can program against a conceptual application model instead of programming directly against a relational schema-model. By doing so we can decrease the amount of code we do write to access a data storage and thus decrease maintenance time. You can find many posts regarding Entity Framework in this blog.
A lot of people wonder why we should use Entity Framework in the first place. We could still keep using good old T-SQL in our applications.
The obvious answer is that EF addresses the Object Relation impedance mismatch and it bridges those two different worlds. Entity Framework creates an object oriented model for accessing the data tier. In an object oriented development environment, it makes working with the data tier much more seamless for the developer. It allows developers to spend time writing code for their application rather than dealing with the tedious tasks of opening connections to the database e.t.c. The abstraction that is offered by EF by generating the intermediate code, which in this case is T-SQL, it's much easier to migrate code to another platform, such as Oracle or Postgres or some other ODBC source. We get incredible flexibility by doing this.
Using EF does not mean we should forget about SQL Server, T-SQL, relationships, foreign keys and performance. We should keep in mind that SQL Server is based on set theory and relational algebra and it thrives when acting on sets of data, updating a set of rows rather than each row of data at the time.
1) Create an empty ASP.Net Application (Web Forms Application) and give it the name EFoptimisation2. I am using Visual Studio 2013 Ultimate edition.
2) Add a new web forms page in the application. Leave the default name. The application I am going to build is very simple web forms application. The user will enter a last name and will get back the first name(s) for that last name.
3) I will use the AdventureWorks2014 database (You can download it here) for this application and more specifically the Person.Person table. I have installed SQL Server 2014 Enterprise edition in my machine.
4) I will add an ADO.Net Entity data model using Database First paradigm. Follow the wizzard steps, create the connection string and then import into the conceptual model only the Person and EmailAddress tables which will become an entities in the domain model. If you want to look at those detailed steps if you are new to EF and Database First have a look at this post.
5) Add a textbox and a button to the page. The user will enter the first name in the textbox and will hit enter and then the results (the email addresses for that first name) will be printed on the page.We will navigate to the EmailAddress entity throug the navigation property EmailAddresses
This is the code inside the Page_Load event handling routine.
protected void Page_Load(object sender, EventArgs e)
{
using (var ctx = new AdventureWorks2014Entities())
{
string FirstName = TextBox1.Text;
var query = from p in ctx.People
where p.FirstName.Equals(FirstName)
select p;
foreach (var person in query)
{
foreach (var email in person.EmailAddresses)
{
Response.Write(email.EmailAddress1);
Response.Write("<br/>");
}
}
}
}
The code above is pretty straight forward.
6) Now we are ready to run the application. Before we do that I will launch SSMS and connect to my local instance of SQL Server. Then I will also launch the SQL Profiler and create a new trace. The trace will listen only for the RPC:Completed event. I activate te trace so the trace is running.
7) I build and run my web app. The results I get back when typing "Alex" as first name is 51 email addresses.
8) Let me see what the trace recorded in my SQL Profiler and the T-SQL that was generated.
We have an individual statement for every email address that we retrieved. This is a not a set based operation since we issue many transactions to the SQL Server.
9) Now we will rewrite our code above in order for EF to work better with the SQL Engine. I am going to use the "Include" method in my code.
We do inform Entity Framework that not only we want all the columns from the People object specified in the from clause, but also want all those columns in the path specified as a parameter of the include method-- EmailAddress in our scenario.
protected void Page_Load(object sender, EventArgs e)
{
using (var ctx = new AdventureWorks2014Entities())
{
string FirstName = TextBox1.Text;
var query = from p in ctx.People.Include("EmailAddresses")
where p.FirstName.Equals(FirstName)
select p;
foreach (var person in query)
{
foreach (var email in person.EmailAddresses)
{
Response.Write(email.EmailAddress1);
Response.Write("<br/>");
}
}
}
}
The Profiler is still running on the background.
I build and run my web app. The results I get back when typing "Alex" as first name is 51 email addresses.
This is what I get back from the Profiler.
Now as you notice it's a pretty extensive query but there is a problem with this approach.We are not allowed to use projection. We can only use the columns of the People object.
10) We need to rewrite our code again. I will use explicit joins this time.The code follows.
protected void Page_Load(object sender, EventArgs e)
{
using (var ctx = new AdventureWorks2014Entities())
{
string FirstName = TextBox1.Text;
var query = from p in ctx.People
join email in ctx.EmailAddresses
on p.BusinessEntityID equals email.BusinessEntityID
where p.FirstName.Equals(FirstName)
select new { email.EmailAddress1};
foreach (var item in query)
{
Response.Write(item.EmailAddress1);
Response.Write("<br/>");
}
}
}
The Profiler is still running on the background.
I build and run my web app. The results I get back when typing "Alex" as first name is 51 email addresses.
This is what I get back from the Profiler.
As we can see this is a T-SQL statement that we could type in an SSMS Query window. We have one query that results in one set based operation thus improving greatly the performance of our application by getting rid off the unecessary round trips.
11) We could rewrite the code above in a more object oriented way using lambda expressions.
protected void Page_Load(object sender, EventArgs e)
{
using (var ctx = new AdventureWorks2014Entities())
{
string FirstName = TextBox1.Text;
var query = ctx.People
.Where(p => p.FirstName.Equals(FirstName))
.SelectMany(email => email.EmailAddresses)
.Select(theemail => theemail.EmailAddress1);
foreach (var item in query)
{
Response.Write(item);
Response.Write("<br/>");
}
}
}
The Profiler is still running on the background.
I build and run my web app. The results I get back when typing "Alex" as first name is 51 email addresses.
This is the T-SQL statement I got back from the Profiler. This is what it was executed against the database.
EXEC sp_executesql N'SELECT
[Extent2].[EmailAddress] AS [EmailAddress]
FROM [Person].[Person] AS [Extent1]
INNER JOIN [Person].[EmailAddress] AS [Extent2] ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]
WHERE [Extent1].[FirstName] = @p__linq__0', N'@p__linq__0 nvarchar(4000)',
@p__linq__0 = N'Alex'
If we typed that query ourselves in an SSMS query window we would type something like this:
SELECT Person.EmailAddress.EmailAddress
FROM Person.EmailAddress
INNER JOIN Person.Person ON Person.EmailAddress.BusinessEntityID = Person.Person.BusinessEntityID
WHERE Person.Person.FirstName = 'Alex'
As you can see those two queries are pretty much the same.
Entity framework abstracts the T-SQL creation from us, the developers. Having said that we are still in charge of the overall performance of our application. Performance plays always a big role in any application. We do know that SQL Server thrives on set based operations.We should write our Linq to Entities queries in a way that set based T-SQL statements are generated.
Hope it helps!!!