Contents tagged with C#

Associations in EF Code First: Part 6 – Many-valued Associations

Tuesday, May 17, 2011

41 Comments

This is the sixth and last post in a series that explains entity association mappings with EF Code First. I've described these association types so far:

Part 1 – Introduction and Basic Concepts
Part 2 – Complex Types
Part 3 – Shared Primary Key Associations
Part 4 – Table Splitting
Part 5 – One-to-One Foreign Key Associations
Part 6 – Many-valued Associations

Support for many-valued associations is an absolutely basic feature of an ORM solution like Entity Framework. Surprisingly, we’ve managed to get this far without needing to talk much about these types of associations. Even more surprisingly, there is not much to say on the topic—these associations are so easy to use in EF that we don’t need to spend a lot of effort explaining it. To get an overview, we first consider a domain model containing different types of associations and will provide necessary explanations around each of them. Since this is the last post in this series, I'll show you two tricks at the end of this post that you might find them useful in your EF Code First developments.

Many-valued entity associations

A many-valued entity association is by definition a collection of entity references. One-to-many associations are the most important kind of entity association that involves a collection. We go so far as to discourage the use of more exotic association styles when a simple bidirectional many-to-one/one-to-many will do the job. A many-to-many association may always be represented as two many-to-one associations to an intervening class. This model is usually more easily extensible, so we tend not to use many-to-many associations in applications.

Introducing the OnlineAuction Domain Model

The model we introducing here is related to an online auction system. OnlineAuction site auctions many different kinds of items. Auctions proceed according to the “English auction” model: users continue to place bids on an item until the bid period for that item expires, and the highest bidder wins. A high-level overview of the domain model is shown in the following class diagram:

Each item may be auctioned only once, so we have a single auction item entity named Item. Bid is associated directly with Item.

The Object Model

The following shows the POCO classes that form the object model for this domain:

public class User
{
    public int UserId { get; set; }
    public string Name { get; set; }
 
    public virtual ICollection<Item> BoughtItems { get; set; }
}
 
public class Item
{
    public int ItemId { get; set; }
    public string Name { get; set; }
    public double InitialPrice { get; set; }
    public DateTime StartDate { get; set; }
    public DateTime EndDate { get; set; }
    public int? BuyerId { get; set; }
    public int? SuccessfulBidId { get; set; }
 
    public virtual User Buyer { get; set; }
    public virtual Bid SuccessfulBid { get; set; }
    public virtual ICollection<Bid> Bids { get; set; }
    public virtual ICollection<Category> Categories { get; set; }
}
 
public class Bid
{
    public int BidId { get; set; }
    public double Amount { get; set; }
    public DateTime CreatedOn { get; set; }
    public int ItemId { get; set; }
    public int BidderId { get; set; }
 
    public virtual Item Item { get; set; }
    public virtual User Bidder { get; set; }
}
 
public class Category
{
    public int CategoryId { get; set; }
    public string Name { get; set; }
    public int? ParentCategoryId { get; set; }
 
    public virtual Category ParentCategory { get; set; }
    public virtual ICollection<Category> ChildCategories { get; set; }
    public virtual ICollection<Item> Items { get; set; }
}

The Simplest Possible Association

The association from Bid to Item (and vice versa) is an example of the simplest possible kind of entity association. You have two properties in two classes. One is a collection of references, and the other a single reference. This mapping is called a bidirectional one-to-many association. The property ItemId in the Bid class is a foreign key to the primary key of the Item entity, something that we call a Foreign Key Association in EF 4. We defined the type of the ItemId property as an int which can't be null because we can’t have a bid without an item—a constraint will be generated in the SQL DDL to reflect this. We use HasRequired method in fluent API to create this type of association:

class BidConfiguration : EntityTypeConfiguration<Bid>
{
    internal BidConfiguration()
    {
        this.HasRequired(b => b.Item)
            .WithMany(i => i.Bids)
            .HasForeignKey(b => b.ItemId);
    }
}

An Optional One-to-Many Association Between User and Item Entities

Each item in the auction may be bought by a User, or might not be sold at all. Note that the foreign key property BuyerId in the Item class is of type Nullable<int> which can be NULL as the association is in fact to-zero-or-one. We use HasOptional method to create this association between User and Item (using this method, the foreign key must be a Nullable type or Code First throws an exception):

class ItemConfiguration : EntityTypeConfiguration<Item>
{
    internal ItemConfiguration()
    {
        this.HasOptional(i => i.Buyer)
            .WithMany(u => u.BoughtItems)
            .HasForeignKey(i => i.BuyerId);
    }
}

A Parent/Child Relationship

In the object model, the association between User and Item is fairly loose. We’d use this mapping in a real system if both entities had their own lifecycle and were created and removed in unrelated business processes. Certain associations are much stronger than this; some entities are bound together so that their lifecycles aren’t truly independent. For example, it seems reasonable that deletion of an item implies deletion of all bids for the item. A particular bid instance references only one item instance for its entire lifetime. In this case, cascading deletions makes sense. In fact, this is what the composition (the filled out diamond) in the above UML diagram means. If you enable cascading delete, the association between Item and Bid is called a parent/child relationship, and that's exactly what EF Code First does by default on associations created with the HasRequired method.

In a parent/child relationship, the parent entity is responsible for the lifecycle of its associated child entities. This is the same semantic as a composition using EF complex types, but in this case only entities are involved; Bid isn’t a value type. The advantage of using a parent/child relationship is that the child may be loaded individually or referenced directly by another entity. A bid, for example, may be loaded and manipulated without retrieving the owning item. It may be stored without storing the owning item at the same time. Furthermore, you reference the same Bid instance in a second property of Item, the single SuccessfulBid (take another look at the Item class in the object model above). Objects of value type can’t be shared.

Many-to-Many Associations

The association between Category and Item is a many-to-many association, as can be seen in the above class diagram. a many-to-many association mapping hides the intermediate association table from the application, so you don’t end up with an unwanted entity in your domain model. That said, In a real system, you may not have a many-to-many association since my experience is that there is almost always other information that must be attached to each link between associated instances (such as the date and time when an item was added to a category) and that the best way to represent this information is via an intermediate association class (In EF, you can map the association class as an entity and map two one-to-many associations for either side.).

In a many-to-many relationship, the join table (or link table, as some developers call it) has two columns: the foreign keys of the Category and Item tables. The primary key is a composite of both columns. In EF Code First, many-to-many associations mappings can be customized with a fluent API code like this:

class ItemConfiguration : EntityTypeConfiguration<Item>
{
    internal ItemConfiguration()
    {
        this.HasMany(i => i.Categories)
            .WithMany(c => c.Items)
            .Map(mc =>
            {
                mc.MapLeftKey("ItemId");
                mc.MapRightKey("CategoryId");
                mc.ToTable("ItemCategory");
            });
    }
}

SQL Schema

The following shows the SQL schema that Code First creates from our object model:

Get the Code First Generated SQL DDL

A common process, if you’re starting with a new application and new database, is to generate DDL with Code First automatically during development; At the same time (or later, during testing), a professional DBA verifies and optimizes the SQL DDL and creates the final database schema. You can export the DDL into a text file and hand it to your DBA. CreateDatabaseScript on ObjectContext class generates a data definition language (DDL) script that creates schema objects (tables, primary keys, foreign keys) for the metadata in the the store schema definition language (SSDL) file (in the next section, you'll see where this metadata come from):

using (var context = new Context())
{
    string script = ((IObjectContextAdapter)context).ObjectContext.CreateDatabaseScript();
}

You can then use one of the classes in the .Net File IO API like StreamWriter to write the script on the disk.

Note how Code First enables cascade deletes for the parent/child relationship between Item and Bid

Get the Runtime EDM

One of the benefits of Code First development is that we don't need to deal with the Edmx file, however, that doesn't mean that the concept of EDM doesn't exist at all. In fact, at runtime, when the context is used for the first time, Code First derives the EDM (CSDL, MSL, and SSDL) from our object model and this EDM is even cached in the app-domain as an instance of DbCompiledModel. Having access to this generated EDM is beneficial in many cases. At the very least, we can add it to our solution and use it as a class diagram for our domain model. More importantly, we can use this EDM for debugging when there is a need to look at the model that Code First creates internally. This EDM also contains the conceptual schema definition language (CSDL) something that drives the EF runtime behavior. The trick is to use the WriteEdmx Method from the EdmxWriter class like the following code:

using (var context = new Context())
{
    XmlWriterSettings settings = new XmlWriterSettings();
    settings.Indent = true;
 
    using (XmlWriter writer = XmlWriter.Create(@"Model.edmx", settings))
    {
        EdmxWriter.WriteEdmx(context, writer);
    }                            
}

After running this code, simply right click on your project and select Add Existing Item... and then browse and add the Model.edmx file to the project. Once you added the file, double click on it and visual studio will perfectly show the edmx file in the designer:

Also note how cascade delete is also enabled in the CSDL for the parent/child association between Item and Bid.

Source Code

Click here to download the source code for the OnlineAuction site that we have seen in this post.

Summary

In this series, we focused on the structural aspect of the object/relational paradigm mismatch and discussed one of the main ORM problems relating to associations. We explored the programming model for persistent classes and the EF Code First fluent API for fine-grained classes and associations. Many of the techniques we’ve shown in this series are key concepts of object/relational mapping and I am hoping that you'll find them useful in your Code First developments.

Associations in EF Code First: Part 5 – One-to-One Foreign Key Associations

Sunday, May 1, 2011

C# Code First Entity Framework 4.1

26 Comments

This is the fourth post in a series that explains entity association mappings with EF Code First. I've described these association types so far:

Part 1 – Introduction and Basic Concepts
Part 2 – Complex Types
Part 3 – Shared Primary Key Associations
Part 4 – Table Splitting
Part 5 – One-to-One Foreign Key Associations
Part 6 – Many-valued Associations

In the third part of this series we saw the limitations of shared primary key association and argued that this type of association is relatively rare and in many schemas, a one-to-one association is represented with a foreign key field and a unique constraint. Today we are going to discuss how this is done by learning about one-to-one foreign key associations.

Introducing the Revised Model

In this revised version, each User always have two addresses: one billing address and another one for delivery. The following class diagram demonstrates the domain model:

One-to-One Foreign Key Association

Instead of sharing a primary key, two rows can have a foreign key relationship. One table has a foreign key column that references the primary key of the associated table (The source and target of this foreign key constraint can even be the same table: This is called a self-referencing relationship.). An additional constraint enforces this relationship as a real one to one. For example, by making the BillingAddressId column unique, we declare that a particular address can be referenced by at most one user, as a billing address. This isn’t as strong as the guarantee from a shared primary key association, which allows a particular address to be referenced by at most one user, period. With several foreign key columns (which is the case in our domain model since we also have a foreign key for DeliveryAddress), we can reference the same address target row several times. But in any case, two users can’t share the same address for the same purpose.

The Object Model

Let's start by creating an object model for our domain:

public class User
{
    public int UserId { get; set; }
    public string Name { get; set; }
    public int BillingAddressId { get; set; }
    public int DeliveryAddressId { get; set; }
        
    public Address BillingAddress { get; set; }
    public Address DeliveryAddress { get; set; }
}
 
public class Address
{
    public int AddressId { get; set; }
    public string Street { get; set; }
    public string City { get; set; }
    public string ZipCode { get; set; }
}
 
public class Context : DbContext
{
    public DbSet<User> Users { get; set; }
    public DbSet<Address> Addresses { get; set; }
}

As you can see, User class has introduced two new scalar properties as BillingAddressId and DeliveryAddressId as well as their related navigation properties (BillingAddress and DeliveryAddress).

Configuring Foreign Keys With Fluent API

BillingAddressId and DeliveryAddressId are foreign key scalar properties representing the actual foreign key values that the relationships are established on. However, Code First will not recognize them as the foreign keys for the associations since their names are not aligned with the conventions that it has to infer foreign keys. Therefore, we need to use fluent API (or Data Annotations) to let Code First know about the foreign key properties. The following fluent API code shows how:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    modelBuilder.Entity<User>()
                .HasRequired(a => a.BillingAddress)
                .WithMany()
                .HasForeignKey(u => u.BillingAddressId);
 
    modelBuilder.Entity<User>()
                .HasRequired(a => a.DeliveryAddress)
                .WithMany()
                .HasForeignKey(u => u.DeliveryAddressId);
}

Alternatively, we can use Data Annotations to achieve this. EF 4.1 introduced a new attribute in System.ComponentModel.DataAnnotations namespace called ForeignKeyAttribute. We can place this on a navigation property to specify the property that represents the foreign key of the relationship:

public class User
{
    public int UserId { get; set; }
    public string Name { get; set; }
    public int BillingAddressId { get; set; }
    public int DeliveryAddressId { get; set; }
 
    [ForeignKey("BillingAddressId")]
    public Address BillingAddress { get; set; }
 
    [ForeignKey("DeliveryAddressId")]
    public Address DeliveryAddress { get; set; }
}

That said, we won't use this data annotation and will go with the fluent API way for a reason that you'll soon see.

Creating a SQL Server Schema

The object model seems to be ready to give us the desired SQL schema, however, if we try to create a SQL Server database from it, we will get an InvalidOperationException with this message:

The database creation succeeded, but the creation of the database objects did not. See InnerException for details.

The inner exception is a SqlException containing this message:

Introducing FOREIGN KEY constraint 'User_DeliveryAddress' on table 'Users' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints. Could not create constraint. See previous errors.

As you can tell from the type of the inner exception (SqlException), it has nothing to do with EF or Code First; it has been generated purely by SQL Server when Code First was trying to create a database based on our object model.

What's a Multiple Cascade Path Anyway?

A Multiple Cascade Path happens when a cascade path goes from column col1 in table A to table B and also from column col2 in table A to table B. For example in our case Code First attempted to turn on cascade delete for both BillingAddressId and DeliveryAddressId columns in the Users table. In fact, Code First was trying to use Declarative Referential Integrity (DRI) to enforce cascade deletes and the problem is that SQL Server is not fully ANSI SQL-92 compliant when it comes to the cascading actions. In SQL Server, DRI forbids cascading updates or deletes in a multiple cascade path scenario.

A KB article also explains why we received this error:

"In SQL Server, a table cannot appear more than one time in a list of all the cascading referential actions that are started by either a DELETE or an UPDATE statement. For example, the tree of cascading referential actions must only have one path to a particular table on the cascading referential actions tree".

And it exactly applies to our example: The User table appeared twice in a list of cascading referential actions started by a DELETE from the Addresses table. Basically, SQL Server does simple counting of cascade paths and, rather than trying to work out whether any cycles actually exist, it assumes the worst and refuses to create the referential actions (cascades). Therefore, depend on your database engine, you may or may not get this exception.

Overriding The Code First Convention To Resolve the Problem

As you saw, Code First automatically turns on cascade delete on a required one-to-many association based on the conventions. However, in order to resolve the exception that we got from SQL Server, we have no choice other than overriding this cascade delete behavior detected by convention. Basically we need to switch cascade delete off on at least one of the relationships and as of EF 4.1, there is no way to accomplish this other than using fluent API. Let's switch it off on DeliveryAddress association for example:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    modelBuilder.Entity<User>()
                .HasRequired(a => a.BillingAddress)
                .WithMany()
                .HasForeignKey(u => u.BillingAddressId);
 
    modelBuilder.Entity<User>()
                .HasRequired(a => a.DeliveryAddress)
                .WithMany()
                .HasForeignKey(u => u.DeliveryAddressId).WillCascadeOnDelete(false);
}

One-to-One Foreign Key Associations in EF Code First

As you may have noticed, both associations in the fluent API code has been configured as a many-to-one—not one-to-one, as you might have expected. The reason is simple: Code First (and EF in general) does not natively support one-to-one foreign key associations. In fact, EF does not support any association scenario that involves unique constraints at all. Fortunately, in this case we don’t care what’s on the target side of the association, so we can treat it like a to-one association without the many part. All we want is to express “This entity (User) has a property that is a reference to an instance of another entity (Address)” and use a foreign key field to represent that relationship. EF (of course) still thinks that the relationship is many-to-one. This is a workaround for the current EF limitation which comes with two consequences: First, EF won't create any additional constraint for us to enforces this relationship as a one to one, we need to manually create it ourselves. The second limitation that this lack of support impose to us is more important: one to one foreign key associations cannot be bidirectional (e.g. we cannot define a property for the User on the Address class).

Create a Unique Constraint To Enforce the Relationship as a One to One

We can manually create unique constraints on the foreign keys in the database after Code First creates it for us but if you are like me and prefer to create your database in one shot then there is a way to have Code First create the constraints as part of its database creation process. For that we can take advantage of the new EF 4.1 ExecuteSqlCommand method on Database class which allows raw SQL commands to be executed against the database. The best place to invoke ExecuteSqlCommand method for this purpose is inside a Seed method that has been overridden in a custom initializer class:

protected override void Seed(Context context)
{
    context.Database.ExecuteSqlCommand("ALTER TABLE Users ADD CONSTRAINT uc_Billing UNIQUE(BillingAddressId)");
    context.Database.ExecuteSqlCommand("ALTER TABLE Users ADD CONSTRAINT uc_Delivery UNIQUE(DeliveryAddressId)");
}

This code adds unique constraints to the BillingAddressId and DeliveryAddressId columns in the DDL generated by Code First.

SQL Schema

The object model is ready now and will result in the following database schema:

It is also worth mentioning that we can still enforce cascade deletes for the Delivery Address relationship. SQL Server allows enforcing referential integrity in two different ways. DRI that we just saw is the most basic yet least flexible way. The other way is to use Triggers. We can write a Delete Trigger on the primary table that either deletes the rows in the dependent table(s) or sets all corresponding foreign keys to NULL (In our case the foreign keys are Non-Nullable so it has to delete the dependent rows).

Source Code

Click here to download the source code for the one-to-one foreign key association sample that we have built in this post.

Summary

In this post we learned about one-to-one foreign key associations as a better way to create one to one relationships. We saw some limitations such as the need for manual creation of unique constraints and also the fact that this type of association cannot be bidirectional, all due to the lack of unique constraint support in EF. The good news is that the ADO.NET team is working on enabling unique constraints in EF but support for unique constraints requires changes to the whole EF stack which won't happen until the next major release of EF (EF 4.1 is merely layered on top of the current .NET 4.0 functionality) and until then the workaround that I showed here is going to be the way to implement one-to-one foreign key associations in EF Code First.

Associations in EF Code First: Part 4 – Table Splitting

Sunday, April 24, 2011

C# Code First Entity Framework 4.1

15 Comments

This is the fourth post in a series that explains entity association mappings with EF Code First. This series includes:

Part 1 – Introduction and Basic Concepts
Part 2 – Complex Types
Part 3 – Shared Primary Key Associations
Part 4 – Table Splitting
Part 5 – One-to-One Foreign Key Associations
Part 6 – Many-valued Associations

In the second part of this series we saw how to map a special kind of one-to-one association—a composition with complex types. We argued that this is usually the simplest way to represent one-to-one relationships which comes with some limitations. We addressed the first limitation (shared references) by introducing shared primary key associations in the previous blog post. In today’s blog post we are going to address the third limitation of the complex types by learning about Table Splitting as yet another way to map a one-to-one association.

The Motivation Behind this Mapping: A Complex Type That Can be Lazy Loaded

A shared primary key association does not expose us to the third limitation of the complex types regarding Lazy Loading, we can of course lazy/defer load the Address information of a given user but at the same time, it does not give us the same SQL schema as the complex type mapping. After all, it adds a new table for the Address entity to the schema while mapping the Address with a complex type stores the address information in the Users table. So the question still remains there: How can we keep everything (e.g. User and Address) in one single table yet be able to lazy load the complex type part (Address) after reading the principal entity (User)? In other words, how can we have lazy loading with a complex type?

Splitting a Single Table into Multiple Entities

Table splitting (a.k.a. horizontal splitting) enables us to map a single table to multiple entities. This is particularly useful for scenarios that we have a table with many columns where some of those columns might not be needed as frequently as others or some of the columns are expensive to load (e.g. a column with a binary data type).

An Example From the Northwind Database

Unlike the other parts of this series, where we start with an object model and then derive a SQL schema afterwards, in this post we are going to do the reverse, for a reason that you'll see, we will start with an existing schema and will try to create an object model that matches the schema. For that we are going to use the Employees table from the Northwind database. You can download and install Northwind database from here If you don't have it already installed on your SQL Server. The following shows the Employees table from the Northwind database that we are going to use:

As you can see, this table has a Photo column of image type which makes it a good candidate to be lazy loaded each time we read an Employee from this table.

The Object Model

As the following object model shows, I created two entities: Employee and EmployeePhoto. I also created a unidirectional association between these two by defining a navigation property on the Employee class called EmployeePhoto:

public class Employee
{
    public int EmployeeID { get; set; }
    public string LastName { get; set; }
    public string FirstName { get; set; }
    public string Title { get; set; }
    public string TitleOfCourtesy { get; set; }
    public DateTime? BirthDate { get; set; }
    public DateTime? HireDate { get; set; }
    public string Address { get; set; }
    public string City { get; set; }
    public string Region { get; set; }
    public string PostalCode { get; set; }
    public string Country { get; set; }
    public string HomePhone { get; set; }
    public string Extension { get; set; }        
    public string Notes { get; set; }
    public int? ReportsTo { get; set; }        
 
    public virtual EmployeePhoto EmployeePhoto { get; set; }
}
 
public class EmployeePhoto
{
    [Key]
    public int EmployeeID { get; set; }
    public byte[] Photo { get; set; }
    public string PhotoPath { get; set; }
}
 
public class NorthwindContext : DbContext
{        
    public DbSet<Employee> Employees { get; set; }
    public DbSet<EmployeePhoto> EmployeePhoto { get; set; }     
}

How to Create a Table Splitting with Fluent API?

As also mentioned in the previous post, by convention, Code First always takes a unidirectional association as one-to-many unless we specify otherwise with fluent API. However, the fluent API codes that we have seen so far in this series won't let us create a table splitting. If we mark EmployeePhoto class as a complex type, we wouldn't be able to lazy load it anymore or if we create a shared primary key association then it will look for a separate table for the EmployeePhoto entity which we don't have in the Northwind database. The trick is to create a shared primary key association between Employee and EmployeePhoto entities but then instruct Code First to map them both to the same table. The following code shows how:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    modelBuilder.Entity<Employee>()
                .HasRequired(e => e.EmployeePhoto)
                .WithRequiredPrincipal();
        
    modelBuilder.Entity<Employee>().ToTable("Employees");
    modelBuilder.Entity<EmployeePhoto>().ToTable("Employees");
}

Note how we made both ends of the association required by using HasRequired and WithRequiredPrincipal methods, even though both the Photo and PhotoPath columns has been defined to allow NULLs.

See the Lazy Loading of the Dependent Entity in Action

Now it's time to write a test to make sure that EF does not select the Photo column each time we query for an employee:

using (var context = new NorthwindContext())
{
    Employee employee = context.Employees.First();
    byte[] photo = employee.EmployeePhoto.Photo;
}

The following screen shot from the SQL Profiler shows the query that has been submitted to SQL Server as the result of reading the first employee object:

Accessing the EmployeePhoto navigation property of the employee object on the next line causes EF to submit a second query to the SQL Server to lazy (implicit) load the EmployeePhoto (By default, EF fetches associated objects and collections lazily whenever you access them):

Where to Use this Mapping?

I recommend using Table Splitting only for mapping of the legacy databases, actually that's the reason we start this post from an existing database like Northwind. For green-field development scenarios consider using shared primary key association instead. There are several reasons why you may want to split the Employee table to two tables when designing a new physical data model for your application. In fact, it is very common for most applications to require a core collection of data attributes of any given entity, and then a specific subset of the noncore data attributes. For example, the core columns of the Employee table would include the columns required to store their name, address, and phone numbers; whereas noncore columns would include the Photo column. Because Employee.Photo is large, and required only by a few applications, you would want to consider splitting it off into its own table. This would help to improve retrieval access times for applications that select all columns from the Employee table yet do not require the photo. This also works pretty well for EF since it doesn't support lazy loading at the scalar property or complex type level.

Summary

In this post we learned about mapping a one-to-one association with table splitting. It enabled us to have lazy loading for the EmployeePhoto entity, something that we would have missed, had we mapped it with a complex type. We saw that on the database side it looks like a complex type mapping but on the object model it is not a complex type since we mapped EmployeePhoto as an Entity with an object identifier (EmployeeID). In fact, it's a special kind of a shared primary key association where both the principal and dependent entities are mapped to one single table. This somehow exotic one-to-one association mapping should be reserved only for the mapping of existing legacy databases.

Associations in EF Code First: Part 3 – Shared Primary Key Associations

Thursday, April 14, 2011

C# Code First Entity Framework 4.1

25 Comments

This is the third post in a series that explains entity association mappings with EF Code First. This series includes:

Part 1 – Introduction and Basic Concepts
Part 2 – Complex Types
Part 3 – Shared Primary Key Associations
Part 4 – Table Splitting
Part 5 – One-to-One Foreign Key Associations
Part 6 – Many-valued Associations

In the previous blog post I demonstrated how to map a special kind of one-to-one association—a composition with complex types. We argued that the relationship between User and Address is best represented with a complex type mapping and we saw that this is usually the simplest way to represent one-to-one relationships but comes with some limitations.

In today’s blog post I’m going to discuss how we can address those limitations by changing our mapping strategy. This is particularly useful for scenarios that we want a dedicated table for Address, so that we can map both User and Address as entities. One benefit of this model is the possibility for shared references— another entity class (let’s say Shipment) can also have a reference to a particular Address instance. If a User has a reference to this instance, as her BillingAddress, the Address instance has to support shared references and needs its own identity. In this case, User and Address classes have a true one-to-one association.

Introducing the Revised Model

In this revised version, each User could have one BillingAddress (Billing Association). Also a Shipment always needs a destination address for delivery (Delivery Association). The following shows the class diagram for this domain model (note the multiplicities on association lines):

In this model we assumed that the billing address of the user is the same as her delivery address. Now let’s create the association mappings for this domain model. There are several choices, the first being a One-to-One Primary Key Association.

Shared Primary Key Associations

Also know as One-to-One Primary Key Associations, means two related tables share the same primary key values. The primary key of one table is also a foreign key of the other. Let’s see how we can create a primary key association mapping with Code First.

How to Implement a One-to-One Primary Key Association with Code First

First, we start with the POCO classes. As you can see, we've defined BillingAddress as a navigation property on User class and another one on Shipment class named DeliveryAddress. Both associations are unidirectional since we didn't define related navigation properties on Address class as for User and Shipment.

public class User
{
    public int UserId { get; set; }
    public string Name { get; set; }
 
    public virtual Address BillingAddress { get; set; }
}
 
public class Address
{
    public int AddressId { get; set; }
    public string Street { get; set; }
    public string City { get; set; }
    public string ZipCode { get; set; }
}
        
public class Shipment
{
    public int ShipmentId { get; set; }     
    public string State { get; set; }
 
    public virtual Address DeliveryAddress { get; set; }
}
 
public class Context : DbContext
{
    public DbSet<User> Users { get; set; }
    public DbSet<Address> Addresses { get; set; }
    public DbSet<Shipment> Shipments { get; set; }
}

How Code First Sees the Associations in our Object Model: One-to-Many

Code First reads the model and tries to figure out the multiplicity of the associations. Since the associations are unidirectional, Code First takes this as if one Address has many Users and Many Shipments and will create a one-to-many association for each of them. In other words, a unidirectional association is always inferred as One-to-Many by Code First. So, what we were hoping for —a one-to-one association, is not inline with the Code First conventions.

How to Change the Multiplicity of the Associations to One-to-One by Using the Conventions

Obviously, one way to turn our associations to one-to-one is by making them bidirectional. That is, adding a new navigation property to Address class of type User and another one of type Shipment. By doing that we simply signal Code First that we are looking to have one-to-one associations since for example User has an Address and also Address has a User. Therefore, Code First will change the multiplicity to one-to-one and this will solve the problem.

Should We Make the Associations Bidirectional?

As always, the decision is up to us and depends on whether we need to navigate through our objects in that direction in the application code. In this case, we’d probably conclude that the bidirectional association doesn’t make much sense. If we call anAddress.User, we are saying “give me the user who has this address”, not a very reasonable request. So this is not a good option. Instead we'll keep our object model as it is and will explicitly ask Code First to make our associations one-to-one.

How to Change the Multiplicity to One-to-One with Fluent API

The following code is all that is needed to make the associations to be one-to-one. Note how the multiplicities in the UML class diagram (e.g. 1 on User and 0..1 on address) has been translated to the fluent API code by using HasRequired and HasOptional methods:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    modelBuilder.Entity<User>().HasOptional(u => u.BillingAddress)
                               .WithRequired();
    
    modelBuilder.Entity<Shipment>().HasRequired(u => u.DeliveryAddress)
                                   .WithOptional();
}

Also it worth noting that when we are mapping a one-to-one association with fluent API, we don't need to specify the foreign key as we would do when mapping a one-to-many association with HasForeignKey method. Since EF only supports one-to-one associations on primary keys, it will automatically create the relationship in the database on the primary keys.

Database Schema

The mapping result for our object model is as follows (note the Identity column on Users table):

Referential Integrity

In relational database design the referential integrity rule states that each non-null value of a foreign key must match the value of some primary key. But wait, how does it even applies here? All we have is just three primary keys referencing each other! Who is the primary key and who is the foreign key? The best way to find the answer of this question is to take a look at the properties of the relationships in the database that has been created by Code First:

As you can see, Code First adds a foreign key constraint which links the primary key of the Addresses table to the primary key of the Users table and adds another foreign key constraint that links the primary key of the Shipments table to the primary key of the Addresses table. The foreign key constraint means that a user has to exist for a particular address but not the other way around. In other words, the database guarantees that an Addresses row’s primary key references a valid Users primary key and a Shipments row’s primary key references a valid Addresses primary key.

How Code First Determines the Principal and Dependent Ends in an Association?

Code First has rules to determine the principal and dependent ends of an association. For one-to-many relationships the many end is always the dependent, but it gets a little tricky in one-to-one associations. In one-to-one associations Code First decides based on our object model, and possible data annotations or fluent API code that we may have. For example in this case, we used the following fluent API code to configure the User-Address association:

modelBuilder.Entity<User>().HasOptional(u => u.BillingAddress).WithRequired();

This reads as "User entity has an optional association with one Address object but this association is required for Address entity". For Code First this is good enough to make the decision: It marked User as the principal end and Address as the dependent end in the association. Since we have the same fluent API code for the second association between Address and Shipment, it marks Address as the principal end and Shipment as the dependent end in this association as well.

This decision has some consequences. In fact, the referential integrity that we saw, is the first result of this Code First's principal/dependent decision.

Second Result of Code First's Principal/Dependent Decision: Database Identity

If you take a closer look at the above DB schema, you'll notice that only UserId has a regular identifier generator (aka Identity or Sequence) and AddressId and ShipmentId does not. This is a very important consequence of the principal/dependent decision for one-to-one associations: the dependent primary key will become non-Identity by default. This make sense because they share their primary key values and only one of them can be auto generated and we need to take care of providing valid keys for the rest.

What about Cascade Deletes?

As we saw, each Address always belongs to one User and each Shipment always delivered to one single Address. We want to make sure that when we delete a User the possible dependent rows on Address and Shipment also get deleted in the database. In fact, this is one of the Referential Integrity Refactorings which called Introduce Cascading Delete. The primary reason we would apply "Introduce Cascading Delete" is to preserve the referential integrity of our data by ensuring that related rows are appropriately deleted when a parent row is deleted. By default, Code First does not enable cascade delete when it creates a one-to-one relationship in the database. As always we can override this convention by fluent API:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    modelBuilder.Entity<User>().HasOptional(u => u.BillingAddress)
                               .WithRequired()
                               .WillCascadeOnDelete();
    
    modelBuilder.Entity<Shipment>().HasRequired(u => u.DeliveryAddress)
                                   .WithOptional()
                                   .WillCascadeOnDelete();
}

What the Additional Methods Like WithRequiredDependent are for?

The HasRequired method returns an object of type RequiredNavigationPropertyConfiguration which defines two special methods called WithRequiredDependent and WithRequiredPrincipal in addition to the typical WithMany and WithOptional methods that we usually use. We saw that the only reason Code First could figure out principal and dependent in our associations was because our fluent API code clearly specified one end as Required and the other as Optional. But what if both endpoints are required or both are optional in the association? For example consider a scenario that a User always has one Address and Address always has one User (required on both end). Now Code First cannot pick up the principal and dependent ends on its own and that's exactly where methods like WithRequiredDependent come into play. In other words, this scenario ultimately need to be configured by fluent API and fluent API is designed in a way that will force you to explicitly specify who is dependent and who is principal in a required-required or optional-optional association scenario.

For example, this fluent API code shows how we can configure the User-Address association where both ends are required:

modelBuilder.Entity<User>().HasRequired(u => u.BillingAddress).WithRequiredDependent();

Taking a closer look at the RequiredNavigationPropertyConfiguration type also shows the idea:

public class RequiredNavigationPropertyConfiguration<TEntityType, TTargetEntityType>
{
    public DependentNavigationPropertyConfiguration<TEntityType, TTargetEntityType> WithMany();
    public CascadableNavigationPropertyConfiguration WithOptional();
    public CascadableNavigationPropertyConfiguration WithRequiredDependent();
    public CascadableNavigationPropertyConfiguration WithRequiredPrincipal();
}

As you can see, if you want to go another Required after HasRequired method, you have to either call WithRequiredDependent or WithRequiredPrincipal since there is no WithRequired method defined on RequiredNavigationPropertyConfiguration class.

Working with the Model

Here is an example for adding a new user along with its billing address. EF is smart enough to use the newly generated UserId for the AddressId as well:

using (var context = new Context())
{
    Address billingAddress = new Address()
    {
        Street = "Main St.",
        City   = "Seattle"
    };
                
    User user = new User()
    {
        Name = "Morteza",                    
        BillingAddress = billingAddress
    };
 
    context.Users.Add(user);
    context.SaveChanges();
}

The following code is an example of adding a new Address and Shipment for an existing User (assuming that we have a User with UserId = 1 in the database):

using (var context = new Context())
{
    Address deliveryAddress = new Address()
    {
        AddressId = 1,
        Street = "Main St.",                    
    };
 
    Shipment shipment = new Shipment()
    {
        ShipmentId = 1,
        State = "Shipped",                    
        DeliveryAddress = deliveryAddress
    };
 
    context.Shipments.Add(shipment);
    context.SaveChanges();
}

Limitations of This Mapping

There are two important limitations to associations mapped as shared primary key:

Difficulty in Saving Related Objects
Multiple Addresses for User is Not Possible

Summary

In this post we learned about one-to-one associations which shared primary key is just one way to implement it. Shared primary key associations aren’t uncommon but are relatively rare. In many schemas, a one-to-one association is represented with a foreign key field and a unique constraint. In the next posts we will revisit the same domain model and will learn about other ways to map one-to-one associations that does not have the limitations of the shared primary key association mapping.

References

Associations in EF Code First: Part 2 – Complex Types

Monday, March 28, 2011

C# Code First Entity Framework 4.1

8 Comments

This is the second post in a series that explains entity association mappings with EF Code First. This series includes:

Part 1 – Introduction and Basic Concepts
Part 2 – Complex Types
Part 3 – Shared Primary Key Associations
Part 4 – Table Splitting
Part 5 – One-to-One Foreign Key Associations
Part 6 – Many-valued Associations

Introducing the Model

First, let's review the model that we are going to use in order to create a Complex Type with EF Code First. It's a simple object model which consists of two classes: User and Address. Each user could have one billing address (or nothing at all–note the multiplicities on the class diagram). The Address information of a User is modeled as a separate class as you can see in the class diagram below:

In object-modeling terms, this association is a kind of aggregation—a part-of relationship. Aggregation is a strong form of association; it has some additional semantics with regard to the lifecycle of objects. In this case, we have an even stronger form, composition, where the lifecycle of the part is fully dependent upon the lifecycle of the whole.

Fine-grained Domain Models

The motivation behind this design was to achieve Fine-grained domain models. In crude terms, fine-grained means more classes than tables. For example, a user may have both a billing address and a home address. In the database, you may have a single Users table with the columns BillingStreet, BillingCity, and BillingZipCode along with HomeStreet, HomeCity, and HomeZipCode. There are good reasons to use this somewhat denormalized relational model (performance, for one). In our object model, we can use the same approach, representing the two addresses as six string-valued properties of the User class. But it’s much better to model this using an Address class, where User has the BillingAddress and HomeAddress properties. This object model achieves improved cohesion and greater code reuse and is more understandable.

Complex Types are Objects with No Identity

When it comes to the actual C# implementation, there is no difference between this composition and other weaker styles of association but in the context of ORM, there is a big difference: A composed class is often a candidate Complex Type (aka Value Object). But C# has no concept of composition—a class or property can’t be marked as a composition. The only difference is the object identifier: a complex type has no individual identity (e.g. there is no AddressId defined on Address class) which make sense because when it comes to the database everything is going to be saved into one single table.

Complex Type Discovery

Code First has a concept of Complex Type Discovery that works based on a set of Conventions. The convention is that if Code First discovers a class where a primary key cannot be inferred, and no primary key is registered through Data Annotations or the fluent API, then the type will be automatically registered as a complex type. Complex type detection also requires that the type does not have properties that reference entity types (i.e. all the properties must be scalar types) and is not referenced from a collection property on another type.

How to Implement a Complex Type with EF Code First

The following shows the implementation of the introduced model in Code First:

public class User
{
    public int UserId { get; set; }
    public string Name { get; set; }
    
    public Address Address { get; set; }
}
 
public class Address
{
    public string Street { get; set; }
    public string City { get; set; }
    public string ZipCode { get; set; }
}
 
public class Context : DbContext
{
    public DbSet<User> Users { get; set; }
}

With code first, this is all of the code we need to write to create a complex type, we do not need to configure any additional database schema mapping information through Data Annotations or the fluent API.

Complex Types: Splitting a Table Across Multiple Types

The mapping result for this object model is as follows (Note how Code First prefixes the complex type's column names with the name of the complex type):

Complex Types are Required

As a limitation of EF in general, complex types are always considered required. To see this limitation in action, let's try to add a record to the Users table:

using (var context = new Context())
{
    User user = new User()
    {
        Name = "Morteza"
    };
 
    context.Users.Add(user);
    context.SaveChanges();
}

Surprisingly, this code throws a System.Data.UpdateException at runtime with this message:

Null value for non-nullable member. Member: 'Address'.

If we initialize the address object, the exception would go away and the user will be successfully saved into the database:

Now if we read back the inserted record from the database, EF will return an Address object with Null values on all of its properties (Street, City and ZipCode). This means that even when you store a complex type object with all null property values, EF still returns an initialized complex type when the owning entity (e.g. User) is retrieved from the database.

Explicitly Register a Type as Complex

You saw that in our model, we did not use any data annotation or fluent API code to designate the Address as a complex type, yet Code First detects it as a complex type based on Complex Type Discovery. But what if our domain model requires a new property like "Id" on Address class? This new Id property is just another scalar non-primary key property that represents let's say another piece of information about Address. Now Code First can (and will) infer a key and therefore marks Address as an entity that has its own mapping table unless we specify otherwise. This is where explicit complex type registration comes into play. There are two ways to register a type as complex:

Using Data Annotations

EF 4.1 introduces a new attribute in System.ComponentModel.DataAnnotations namespace called ComplexTypeAttribute. All we need to do is to place this attribute on our Address class:

[ComplexType]
public class Address
{
    public string Id { get; set; }
    public string Street { get; set; }
    public string City { get; set; }
    public string ZipCode { get; set; }
}

This will keep Address as a complex type in our model despite its Id property.

Using Fluent API

Alternatively, we can use ComplexType generic method defined on DbModelBuilder class to register our Address type as complex:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    modelBuilder.ComplexType<Address>();
}

Best Practices When Working with Complex Types

Always Initialize the Complex Type:
Add a Read Only Property to the Complex Type for Null Value Checking:

HasValue

Consider Always Explicitly Registering a ComplexType:

domain layer

Therefore, our final object model will be:

public class User
{
    public User()
    {
        Address = new Address();
    }
 
    public int UserId { get; set; }        
    public string Name { get; set; }
        
    public Address Address { get; set; }
}
 
[ComplexType]
public class Address
{
    public string Street { get; set; }
    public string City { get; set; }
    public string ZipCode { get; set; }
 
    public bool HasValue
    {
        get
        {
            return (Street != null || ZipCode != null || City != null);
        }
    }
}

The interesting point is that we did not have to explicitly exclude the HasValue property from the mapping above. Since HasValue has been defined as a read only property (i.e. there is no setter), EF Code First will be ignoring it based on conventions, which makes sense since a read only property is most probably representing a computed value and does not need to be persisted in the database.

Customize Complex Type's Property Mappings at Entity Level

We can customize the individual property mappings of the complex type. For example, The Users table now contains, among others, the columns Address_Street, Address_PostalCode, and Address_City. We can rename these with ColumnAttribute:

public class Address
{
    [Column("Street")]
    public string Street { get; set; }
    public string City { get; set; }        
    public string PostalCode { get; set; }
}

Fluent API can give us the same result as well:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    modelBuilder.ComplexType<Address>()
                .Property(a => a.Street)
                .HasColumnName("Street");        
}

Any other entity table that contains complex type fields (say, a Customer class that also has an Address) uses the same column options. Sometimes we’ll want to override the settings we made inside the complex type from outside for a particular entity. This is often the case when we try to derive an object model from a legacy database. For example, here is how we can rename the Address columns for Customer class:

public class User
{
    public int UserId { get; set; }
    public string Name { get; set; }
 
    public Address Address { get; set; }
}
    
public class Customer
{
    public int CustomerId { get; set; }
    public string PhoneNumber { get; set; }
            
    public Address Address { get; set; }
}
 
[ComplexType]
public class Address
{
    [Column("Street")]
    public string Street { get; set; }
    public string City { get; set; }
    public string ZipCode { get; set; }
}
    
public class Context : DbContext
{
    public DbSet<User> Users { get; set; }
    public DbSet<Customer> Customers { get; set; }
 
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Customer>()
                    .Property(c => c.Address.Street)
                    .HasColumnName("Customer_Street");
    }
}

Complex Types and the New Change Tracking API

As part of the new DbContext API, EF 4.1 came with a new set of change tracking API that enables us to access Original and Current values of our entities. The Original Values are the values the entity had when it was queried from the database. The Current Values are the values the entity has now. This feature also fully supports complex types.

The entry point for accessing the new change tracking API is DbContext's Entry method which returns an object of type DbEntityEntry. DbEntityEntry contains a ComplexProperty method that returns a DbComplexPropertyEntry object where we can access the original and current values:

using (var context = new Context())
{
    var user = context.Users.Find(1);
 
    Address originalValues = context.Entry(user)
                                    .ComplexProperty(u => u.Address)
                                    .OriginalValue;    
    
    Address currentValues = context.Entry(user)
                                   .ComplexProperty(u => u.Address)
                                   .CurrentValue;
}

Also we can drill down into the complex object and read or set properties of it using chained calls:

string city = context.Entry(user)
                     .ComplexProperty(u => u.Address)
                     .Property(a => a.City)
                     .CurrentValue;

We can even get the nested properties using a single lambda expression:

string city = context.Entry(user)
                     .Property(u => u.Address.City)
                     .CurrentValue;

Limitations of This Mapping

There are three important limitations to classes mapped as Complex Types:

Shared References is Not Possible:

The Address Complex Type doesn’t have its own database identity (primary key) and so can’t be referred to by any object other than the containing instance of User (e.g. a Shipping class that also needs to reference the same User Address, cannot do so).

No Elegant Way to Represent a Null Reference:

As we saw there is no elegant way to represent a null reference to an Address. When reading from database, EF Code First always initialize Address object even if values in all mapped columns of the complex type are null.

Lazy Loading of Complex Types is Not Possible:

Note that EF always initializes the property values of a complex type right away, when the entity instance that holds the complex object is loaded. EF does not support lazy loading for complex types (same limitation also exists if you want to have lazy loading for scalar properties of an entity). This is inconvenient when we have to deal with potentially large values (for example, a property of type byte[] on the Address complex type which has been mapped to a VARBINARY column on Users table and holds an image of the location described by the Address.).

Summary

In this post we learned about fine-grained domain models which complex type is just one example of it. Fine-grained is fully supported by EF Code First and is known as the most important requirement for a rich domain model. Complex type is usually the simplest way to represent one-to-one relationships and because the lifecycle is almost always dependent in such a case, it’s either an aggregation or a composition in UML. In the next posts we will revisit the same domain model and will learn about other ways to map a one-to-one association that does not have the limitations of the complex types.

References

Associations in EF Code First: Part 1 – Introduction and Basic Concepts

Sunday, March 27, 2011

C# Code First Entity Framework 4.1

7 Comments

Earlier this month the data team shipped the Release Candidate of EF 4.1. The most exciting feature of EF 4.1 is Code First, a new development pattern for EF which provides a really elegant and powerful code-centric way to work with data as well as an alternative to the existing Database First and Model First patterns. Code First is designed based on Convention over Configuration paradigm and focused around defining your model using C#/VB.NET classes, these classes can then be mapped to an existing database or be used to generate a database schema. Additional configuration can be supplied using Data Annotations or via a fluent API.

I’m a big fan of the EF Code First approach, and wrote several blog posts about it based on its CTP5 build:

Compare to CTP5, EF 4.1 release is more about bug fixing and bringing it to a go-live quality level than anything else. Pretty much all of the API that has been introduced in CTP5 is still exactly the same (except very few changes including renaming of DbDatabase and ModelBuilder classes as well as consolidation of IsIndependent fluent API method). Therefore, the above blog posts are still usable and can (hopefully) help you in your Code First development. Having said that, I decided to complete my Code First articles by starting a whole new series instead of doing post maintenance on the current CTP5 ones.

A Note For Those Who are New to EF and Code-First

If you choose to learn EF you've chosen well. If you choose to learn EF with Code First you've done even better. To get started, you can find an EF 4.1 Code First walkthrough by ADO.NET team here. In this series, I assume you already setup your machine to do Code First development and also that you are familiar with Code First fundamentals and basic concepts.

Code First And Associations

I will start my EF 4.1 Code First articles by a series on entity association mappings. You will see that when it comes to associations, Code First brings ultimate power and flexibility. This series will come in several parts including:

Part 1 – Introduction and Basic Concepts
Part 2 – Complex Types
Part 3 – Shared Primary Key Associations
Part 4 – Table Splitting
Part 5 – One-to-One Foreign Key Associations
Part 6 – Many-valued Associations

Why Starting with Association Mappings?

From my experience with the EF user community, I know that the first thing many developers try to do when they begin using EF (specially when having a Code First approach) is a mapping of a parent/children relationship. This is usually the first time you encounter collections. It’s also the first time you have to think about the differences between entities and value types, or the type of relationships between your entities. Managing the associations between classes and the relationships between tables is at the heart of ORM. Most of the difficult problems involved in implementing an ORM solution relate to association management.

In order to build a solid foundation for our discussion, we will start by learning about some of the core concepts around the relationship mapping and will leave the discussion for each type of entity associations to the next posts in this series.

What is Mapping?

Mapping is the act of determining how objects and their relationships are persisted in permanent data storage, in our case, relational databases.

What is Relationship Mapping?

A mapping that describes how to persist a relationship (association, aggregation, or composition) between two or more objects.

Types of Relationships

There are two categories of object relationships that we need to be concerned with when mapping associations. The first category is based on multiplicity and it includes three types:

One-to-one relationships: This is a relationship where the maximums of each of its multiplicities is one.
One-to-many relationships: Also known as a many-to-one relationship, this occurs when the maximum of one multiplicity is one and the other is greater than one.
Many-to-many relationships: This is a relationship where the maximum of both multiplicities is greater than one.

The second category is based on directionality and it contains two types:

Uni-directional relationships: when an object knows about the object(s) it is related to but the other object(s) do not know of the original object. To put this in EF terminology, when a navigation property exists only on one of the association ends and not on the both.
Bi-directional relationships: When the objects on both end of the relationship know of each other (i.e. a navigation property defined on both ends).

How Object Relationships are Implemented in POCO Object Models?

When the multiplicity is one (e.g. 0..1 or 1) the relationship is implemented by defining a navigation property that reference the other object (e.g. an Address property on User class). When the multiplicity is many (e.g. 0..*, 1..*) the relationship is implemented via an ICollection of the type of other object.

How Relational Database Relationships are Implemented?

Relationships in relational databases are maintained through the use of Foreign Keys. A foreign key is a data attribute(s) that appears in one table and must be the primary key or other candidate key in another table. With a one-to-one relationship the foreign key needs to be implemented by one of the tables. To implement a one-to-many relationship we implement a foreign key from the “one table” to the “many table”. We could also choose to implement a one-to-many relationship via an associative table (aka Join table), effectively making it a many-to-many relationship.

References

Associations in EF Code First CTP5: Part 3 – One-to-One Foreign Key Associations

Sunday, January 23, 2011

.NET C# Code First CTP5 Entity Framework

11 Comments

This is the third post in a series that explains entity association mappings with EF Code First. I've described these association types so far:

In the previous blog post we saw the limitations of shared primary key association and argued that this type of association is relatively rare and in many schemas, a one-to-one association is represented with a foreign key field and a unique constraint. Today we are going to discuss how this is done by learning about one-to-one foreign key associations.

Introducing the Revised Model

In this revised version, each User always have two addresses: one billing address and another one for delivery. The following figure shows the class diagram for this domain model:

One-to-One Foreign Key Association

Instead of sharing a primary key, two rows can have a foreign key relationship. One table has a foreign key column that references the primary key of the associated table (The source and target of this foreign key constraint can even be the same table: This is called a self-referencing relationship.). An additional constraint enforces this relationship as a real one to one. For example, by making the BillingAddressId column unique, we declare that a particular address can be referenced by at most one user, as a billing address. This isn’t as strong as the guarantee from a shared primary key association, which allows a particular address to be referenced by at most one user, period. With several foreign key columns (which is the case in our domain model since we also have a foreign key for DeliveryAddress), we can reference the same address target row several times. But in any case, two users can’t share the same address for the same purpose.

The Object Model

Let's start by creating an object model for our domain:

public class User
{
    public int UserId { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }        
    public int BillingAddressId { get; set; }
    public int DeliveryAddressId { get; set; }
 
    public Address BillingAddress { get; set; }
    public Address DeliveryAddress { get; set; }
}
 
public class Address
{
    public int AddressId { get; set; }
    public string Street { get; set; }
    public string City { get; set; }
    public string PostalCode { get; set; }   
}
 
public class EntityMappingContext : DbContext
{
    public DbSet<User> Users { get; set; }
    public DbSet<Address> Addresses { get; set; }
}

Configuring Foreign Keys With Fluent API

BillingAddressId and DeliveryAddressId are foreign key scalar properties and representing the actual foreign key values that the relationships are established on. However, Code First will not recognize them as the foreign keys for the associations since their names are not aligned with the conventions that it has to infer foreign keys. Therefore, we need to use fluent API (or Data Annotations) to tell Code First about the foreign keys. Here is the fluent API code to identify the foreign key properties:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<User>()
                .HasRequired(a => a.BillingAddress)
                .WithMany()
                .HasForeignKey(u => u.BillingAddressId);
 
    modelBuilder.Entity<User>()
                .HasRequired(a => a.DeliveryAddress)
                .WithMany()
                .HasForeignKey(u => u.DeliveryAddressId);
}

Alternatively, we can use Data Annotations to achieve this. CTP5 introduced a new attribute in System.ComponentModel.DataAnnotations namespace which is called ForeignKeyAttribute and we can place it on a navigation property to specify the property that represents the foreign key of the relationship:

public class User
{
    public int UserId { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }        
    public int BillingAddressId { get; set; }
    public int DeliveryAddressId { get; set; }       
    
    [ForeignKey("BillingAddressId")]
    public Address BillingAddress { get; set; }
    
    [ForeignKey("DeliveryAddressId")]
    public Address DeliveryAddress { get; set; }
}

However, we will not use this Data Annotation and will stick with our fluent API code for a reason that you'll see soon.

Creating a SQL Server Schema

The object model seems to be ready to give us the desired SQL schema, however, if we try to create a SQL Server database from it, we will get an InvalidOperationException with this message:

"The database creation succeeded, but the creation of the database objects did not. See InnerException for details."

The inner exception is a System.Data.SqlClient.SqlException containing this message:

"Introducing FOREIGN KEY constraint 'User_DeliveryAddress' on table 'Users' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints. Could not create constraint. See previous errors."

SQL Server and Multiple Cascade Paths

A Multiple cascade path happens when a cascade path goes from column col1 in table A to table B and also from column col2 in table A to table B. So it seems that Code First tried to turn on Cascade Delete for both BillingAddressId and DeliveryAddressId columns in Users table. In fact, Code First was trying to use Declarative Referential Integrity (DRI) to enforce cascade deletes and the problem is that SQL Server is not fully ANSI SQL-92 compliant when it comes to the cascading actions. In SQL Server, DRI forbids cascading updates or deletes in a multiple cascade path scenario.

A KB article also explains why we received this error: "In SQL Server, a table cannot appear more than one time in a list of all the cascading referential actions that are started by either a DELETE or an UPDATE statement. For example, the tree of cascading referential actions must only have one path to a particular table on the cascading referential actions tree". (i.e. the User table appeared twice in a list of cascading referential actions started by a DELETE). Basically, SQL Server does simple counting of cascade paths and, rather than trying to work out whether any cycles actually exist, it assumes the worst and refuses to create the referential actions (Cascades).

Therefore, depend on our database engine, we may or may not get this exception (For example, both Oracle and MySQL let us create Cascades in this scenario.).

Overriding Code First Convention To Resolve the Problem

As you saw, Code First automatically turns on Cascade Deletes on required one-to-many associations based on the conventions. However, in order to resolve the exception that we got from SQL Server, we have no choice other than overriding this convention and switching cascade deletes off on at least one of the associations and as of CTP5, the only way to accomplish this is by using fluent API. Let's switch it off on DeliveryAddress Association:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<User>()
                .HasRequired(a => a.BillingAddress)
                .WithMany()
                .HasForeignKey(u => u.BillingAddressId);
 
    modelBuilder.Entity<User>()
                .HasRequired(a => a.DeliveryAddress)
                .WithMany()
                .HasForeignKey(u => u.DeliveryAddressId).WillCascadeOnDelete(false);
}

One-to-One Foreign Key Associations in EF Code First

As you may have noticed, both associations in the fluent API code has been configured as a many-to-one—not one-to-one, as you might have expected. The reason is simple: Code First (and EF in general) does not natively support one-to-one foreign key associations. In fact, EF does not support any association scenario that involves unique constraints at all. Fortunately, in this case we don’t care what’s on the target side of the association, so we can treat it like a to-one association without the many part. All we want is to express “This entity (User) has a property that is a reference to an instance of another entity (Address)” and use a foreign key field to represent that relationship. Basically EF still thinks that the relationship is many-to-one. This is a workaround for the current EF limitation which comes with two consequences: First, EF won't create any additional constraint for us to enforces this relationship as a one to one, we need to manually create it ourselves. The second limitation that this lack of support impose to us is more important: one to one foreign key associations cannot be bidirectional (i.e. we cannot define a User property on the Address class).

Create a Unique Constraint To Enforce the Relationship as a Real One to One

We can manually create unique constraints on the foreign keys in the database after Code First creates it for us but if you are like me and prefer to create your database in one shot then there is a way in CTP5 to have Code First create the constraints as part of its database creation process. For that we can take advantage of the new CTP5’s SqlCommand method on DbDatabase class which allows raw SQL commands to be executed against the database. The best place to invoke SqlCommand method for this purpose is inside a Seed method that has been overridden in a custom Initializer class:

protected override void Seed(EntityMappingContext context)
{
    context.Database.SqlCommand("ALTER TABLE Users ADD CONSTRAINT uc_Billing UNIQUE(BillingAddressId)");
    context.Database.SqlCommand("ALTER TABLE Users ADD CONSTRAINT uc_Delivery UNIQUE(DeliveryAddressId)");
}

This code adds unique constraints to the BillingAddressId and DeliveryAddressId columns in the DDL generated by Code First.

SQL Schema

The object model is ready now and Code First will create the following database schema for us:

It is worth mentioning that we can still enforce cascade deletes for DeliveryAddress relationship. SQL Server allows enforcing Referential Integrity in two different ways. DRI that we just saw is the most basic yet least flexible way. The other way is to use Triggers. We can write a Delete Triggers on the primary table that either deletes the rows in the dependent table(s) or sets all corresponding foreign keys to NULL (In our case the foreign keys are Non-Nullable so it has to delete the dependent rows).

Download

Click here to download and run the one-to-one foreign key association sample that we have built in this blog post.

Summary

In this blog post we learned about one-to-one foreign key associations as a better way to represent one to one relationships. However, we saw some limitations such as the need for manual creation of unique constraints and also the fact that these type of associations cannot be bidirectional, all due to the lack of unique constraint support in EF. Support for unique constraints is going to require changes to the whole EF stack and it won't happen in the RTM targeted for this year as that RTM will be layered on top of the current .NET 4.0 functionality. That said, EF team has this feature on their list for the future, so hopefully it will be supported in a later release of EF and until then the workaround that I showed here is going to be the way to implement one-to-one foreign key associations in EF Code First.

References

Inheritance with EF Code First: Part 3 – Table per Concrete Type (TPC)

Monday, January 3, 2011

.NET C# Code First CTP5 Entity Framework

30 Comments

This is the third (and last) post in a series that explains different approaches to map an inheritance hierarchy with EF Code First. I've described these strategies in previous posts:

In today’s blog post I am going to discuss Table per Concrete Type (TPC) which completes the inheritance mapping strategies supported by EF Code First. At the end of this post I will provide some guidelines to choose an inheritance strategy mainly based on what we've learned in this series.

TPC and Entity Framework in the Past

Table per Concrete type is somehow the simplest approach suggested, yet using TPC with EF is one of those concepts that has not been covered very well so far and I've seen in some resources that it was even discouraged. The reason for that is just because Entity Data Model Designer in VS2010 doesn't support TPC (even though the EF runtime does). That basically means if you are following EF's Database-First or Model-First approaches then configuring TPC requires manually writing XML in the EDMX file which is not considered to be a fun practice. Well, no more. You'll see that with Code First, creating TPC is perfectly possible with fluent API just like other strategies and you don't need to avoid TPC due to the lack of designer support as you would probably do in other EF approaches.

Table per Concrete Type (TPC)

In Table per Concrete type (aka Table per Concrete class) we use exactly one table for each (nonabstract) class. All properties of a class, including inherited properties, can be mapped to columns of this table, as shown in the following figure:

As you can see, the SQL schema is not aware of the inheritance; effectively, we’ve mapped two unrelated tables to a more expressive class structure. If the base class was concrete, then an additional table would be needed to hold instances of that class. I have to emphasize that there is no relationship between the database tables, except for the fact that they share some similar columns.

TPC Implementation in Code First

Just like the TPT implementation, we need to specify a separate table for each of the subclasses. We also need to tell Code First that we want all of the inherited properties to be mapped as part of this table. In CTP5, there is a new helper method on EntityMappingConfiguration class called MapInheritedProperties that exactly does this for us. Here is the complete object model as well as the fluent API to create a TPC mapping:

public abstract class BillingDetail
{
    public int BillingDetailId { get; set; }
    public string Owner { get; set; }
    public string Number { get; set; }
}
        
public class BankAccount : BillingDetail
{
    public string BankName { get; set; }
    public string Swift { get; set; }
}
        
public class CreditCard : BillingDetail
{
    public int CardType { get; set; }
    public string ExpiryMonth { get; set; }
    public string ExpiryYear { get; set; }
}
    
public class InheritanceMappingContext : DbContext
{
    public DbSet<BillingDetail> BillingDetails { get; set; }
        
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Entity<BankAccount>().Map(m =>
        {
            m.MapInheritedProperties();
            m.ToTable("BankAccounts");
        });
 
        modelBuilder.Entity<CreditCard>().Map(m =>
        {
            m.MapInheritedProperties();
            m.ToTable("CreditCards");
        });            
    }
}

The Importance of EntityMappingConfiguration Class

As a side note, it worth mentioning that EntityMappingConfiguration class turns out to be a key type for inheritance mapping in Code First. Here is an snapshot of this class:

namespace System.Data.Entity.ModelConfiguration.Configuration.Mapping
{
    public class EntityMappingConfiguration<TEntityType> where TEntityType : class
    {
        public ValueConditionConfiguration Requires(string discriminator);
        public void ToTable(string tableName);
        public void MapInheritedProperties();
    }
}

As you have seen so far, we used its Requires method to customize TPH. We also used its ToTable method to create a TPT and now we are using its MapInheritedProperties along with ToTable method to create our TPC mapping.

TPC Configuration is Not Done Yet!

We are not quite done with our TPC configuration and there is more into this story even though the fluent API we saw perfectly created a TPC mapping for us in the database. To see why, let's start working with our object model. For example, the following code creates two new objects of BankAccount and CreditCard types and tries to add them to the database:

using (var context = new InheritanceMappingContext())
{
    BankAccount bankAccount = new BankAccount();
    CreditCard creditCard = new CreditCard() { CardType = 1 };
                
    context.BillingDetails.Add(bankAccount);
    context.BillingDetails.Add(creditCard);
 
    context.SaveChanges();
}

Running this code throws an InvalidOperationException with this message:

The changes to the database were committed successfully, but an error occurred while updating the object context. The ObjectContext might be in an inconsistent state. Inner exception message: AcceptChanges cannot continue because the object's key values conflict with another object in the ObjectStateManager. Make sure that the key values are unique before calling AcceptChanges.

The reason we got this exception is because DbContext.SaveChanges() internally invokes SaveChanges method of its internal ObjectContext. ObjectContext's SaveChanges method on its turn by default calls AcceptAllChanges after it has performed the database modifications. AcceptAllChanges method merely iterates over all entries in ObjectStateManager and invokes AcceptChanges on each of them. Since the entities are in Added state, AcceptChanges method replaces their temporary EntityKey with a regular EntityKey based on the primary key values (i.e. BillingDetailId) that come back from the database and that's where the problem occurs since both the entities have been assigned the same value for their primary key by the database (i.e. on both BillingDetailId = 1) and the problem is that ObjectStateManager cannot track objects of the same type (i.e. BillingDetail) with the same EntityKey value hence it throws. If you take a closer look at the TPC's SQL schema above, you'll see why the database generated the same values for the primary keys: the BillingDetailId column in both BankAccounts and CreditCards table has been marked as identity.

How to Solve The Identity Problem in TPC

As you saw, using SQL Server’s int identity columns doesn't work very well together with TPC since there will be duplicate entity keys when inserting in subclasses tables with all having the same identity seed. Therefore, to solve this, either a spread seed (where each table has its own initial seed value) will be needed, or a mechanism other than SQL Server’s int identity should be used. Some other RDBMSes have other mechanisms allowing a sequence (identity) to be shared by multiple tables, and something similar can be achieved with GUID keys in SQL Server. While using GUID keys, or int identity keys with different starting seeds will solve the problem but yet another solution would be to completely switch off identity on the primary key property. As a result, we need to take the responsibility of providing unique keys when inserting records to the database. We will go with this solution since it works regardless of which database engine is used.

Switching Off Identity in Code First

We can switch off identity simply by placing DatabaseGenerated attribute on the primary key property and pass DatabaseGenerationOption.None to its constructor. DatabaseGenerated attribute is a new data annotation which has been added to System.ComponentModel.DataAnnotations namespace in CTP5:

public abstract class BillingDetail
{
    [DatabaseGenerated(DatabaseGenerationOption.None)]
    public int BillingDetailId { get; set; }
    public string Owner { get; set; }
    public string Number { get; set; }
}

As always, we can achieve the same result by using fluent API, if you prefer that:

modelBuilder.Entity<BillingDetail>()
            .Property(p => p.BillingDetailId)
            .HasDatabaseGenerationOption(DatabaseGenerationOption.None);

Working With The Object Model

Our TPC mapping is ready and we can try adding new records to the database. But, like I said, now we need to take care of providing unique keys when creating new objects:

using (var context = new InheritanceMappingContext())
{
    BankAccount bankAccount = new BankAccount() 
    { 
        BillingDetailId = 1                     
    };
    CreditCard creditCard = new CreditCard() 
    { 
        BillingDetailId = 2,
        CardType = 1
    };
                
    context.BillingDetails.Add(bankAccount);
    context.BillingDetails.Add(creditCard);
 
    context.SaveChanges();
}

Polymorphic Associations with TPC is Problematic

The main problem with this approach is that it doesn’t support Polymorphic Associations very well. After all, in the database, associations are represented as foreign key relationships and in TPC, the subclasses are all mapped to different tables so a polymorphic association to their base class (abstract BillingDetail in our example) cannot be represented as a simple foreign key relationship. For example, consider the domain model we introduced here where User has a polymorphic association with BillingDetail. This would be problematic in our TPC Schema, because if User has a many-to-one relationship with BillingDetail, the Users table would need a single foreign key column, which would have to refer both concrete subclass tables. This isn’t possible with regular foreign key constraints.

Schema Evolution with TPC is Complex

A further conceptual problem with this mapping strategy is that several different columns, of different tables, share exactly the same semantics. This makes schema evolution more complex. For example, a change to a base class property results in changes to multiple columns. It also makes it much more difficult to implement database integrity constraints that apply to all subclasses.

Generated SQL

Let's examine SQL output for polymorphic queries in TPC mapping. For example, consider this polymorphic query for all BillingDetails and the resulting SQL statements that being executed in the database:

var query = from b in context.BillingDetails select b;

Just like the SQL query generated by TPT mapping, the CASE statements that you see in the beginning of the query is merely to ensure columns that are irrelevant for a particular row have NULL values in the returning flattened table. (e.g. BankName for a row that represents a CreditCard type).

TPC's SQL Queries are Union Based

As you can see in the above screenshot, the first SELECT uses a FROM-clause subquery (which is selected with a red rectangle) to retrieve all instances of BillingDetails from all concrete class tables. The tables are combined with a UNION operator, and a literal (in this case, 0 and 1) is inserted into the intermediate result; (look at the lines highlighted in yellow.) EF reads this to instantiate the correct class given the data from a particular row. A union requires that the queries that are combined, project over the same columns; hence, EF has to pad and fill up nonexistent columns with NULL. This query will really perform well since here we can let the database optimizer find the best execution plan to combine rows from several tables. There is also no Joins involved so it has a better performance than the SQL queries generated by TPT where a Join is required between the base and subclasses tables.

Choosing Strategy Guidelines

Before we get into this discussion, I want to emphasize that there is no one single "best strategy fits all scenarios" exists. As you saw, each of the approaches have their own advantages and drawbacks. Here are some rules of thumb to identify the best strategy in a particular scenario:

If you don’t require polymorphic associations or queries, lean toward TPC—in other words, if you never or rarely query for BillingDetails and you have no class that has an association to BillingDetail base class. I recommend TPC (only) for the top level of your class hierarchy, where polymorphism isn’t usually required, and when modification of the base class in the future is unlikely.
If you do require polymorphic associations or queries, and subclasses declare relatively few properties (particularly if the main difference between subclasses is in their behavior), lean toward TPH. Your goal is to minimize the number of nullable columns and to convince yourself (and your DBA) that a denormalized schema won’t create problems in the long run.
If you do require polymorphic associations or queries, and subclasses declare many properties (subclasses differ mainly by the data they hold), lean toward TPT. Or, depending on the width and depth of your inheritance hierarchy and the possible cost of joins versus unions, use TPC.

By default, choose TPH only for simple problems. For more complex cases (or when you’re overruled by a data modeler insisting on the importance of nullability constraints and normalization), you should consider the TPT strategy. But at that point, ask yourself whether it may not be better to remodel inheritance as delegation in the object model (delegation is a way of making composition as powerful for reuse as inheritance). Complex inheritance is often best avoided for all sorts of reasons unrelated to persistence or ORM. EF acts as a buffer between the domain and relational models, but that doesn’t mean you can ignore persistence concerns when designing your classes.

Summary

In this series, we focused on one of the main structural aspect of the object/relational paradigm mismatch which is inheritance and discussed how EF solve this problem as an ORM solution. We learned about the three well-known inheritance mapping strategies and their implementations in EF Code First. Hopefully it gives you a better insight about the mapping of inheritance hierarchies as well as choosing the best strategy for your particular scenario.

Happy New Year and Happy Code-Firsting!

References

Inheritance with EF Code First: Part 2 – Table per Type (TPT)

Tuesday, December 28, 2010

.NET C# Code First CTP5 Entity Framework

29 Comments

In the previous blog post you saw that there are three different approaches to representing an inheritance hierarchy and I explained Table per Hierarchy (TPH) as the default mapping strategy in EF Code First. We argued that the disadvantages of TPH may be too serious for our design since it results in denormalized schemas that can become a major burden in the long run. In today’s blog post we are going to learn about Table per Type (TPT) as another inheritance mapping strategy and we'll see that TPT doesn’t expose us to this problem.

Table per Type (TPT)

Table per Type is about representing inheritance relationships as relational foreign key associations. Every class/subclass that declares persistent properties—including abstract classes—has its own table. The table for subclasses contains columns only for each noninherited property (each property declared by the subclass itself) along with a primary key that is also a foreign key of the base class table. This approach is shown in the following figure:

For example, if an instance of the CreditCard subclass is made persistent, the values of properties declared by the BillingDetail base class are persisted to a new row of the BillingDetails table. Only the values of properties declared by the subclass (i.e. CreditCard) are persisted to a new row of the CreditCards table. The two rows are linked together by their shared primary key value. Later, the subclass instance may be retrieved from the database by joining the subclass table with the base class table.

TPT Advantages

The primary advantage of this strategy is that the SQL schema is normalized. In addition, schema evolution is straightforward (modifying the base class or adding a new subclass is just a matter of modify/add one table). Integrity constraint definition are also straightforward (note how CardType in CreditCards table is now a non-nullable column).

Implement TPT in EF Code First

We can create a TPT mapping simply by placing Table attribute on the subclasses to specify the mapped table name (Table attribute is a new data annotation and has been added to System.ComponentModel.DataAnnotations namespace in CTP5):

public abstract class BillingDetail
{
    public int BillingDetailId { get; set; }
    public string Owner { get; set; }
    public string Number { get; set; }
}
 
[Table("BankAccounts")]
public class BankAccount : BillingDetail
{
    public string BankName { get; set; }
    public string Swift { get; set; }
}
 
[Table("CreditCards")]
public class CreditCard : BillingDetail
{
    public int CardType { get; set; }
    public string ExpiryMonth { get; set; }
    public string ExpiryYear { get; set; }
}
 
public class InheritanceMappingContext : DbContext
{
    public DbSet<BillingDetail> BillingDetails { get; set; }
}

If you prefer fluent API, then you can create a TPT mapping by using ToTable() method:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    modelBuilder.Entity<BankAccount>().ToTable("BankAccounts");
    modelBuilder.Entity<CreditCard>().ToTable("CreditCards");
}

Polymorphic Associations

A polymorphic association is an association to a base class, hence to all classes in the hierarchy with dynamic resolution of the concrete class at runtime. For example, consider the BillingInfo property of User in the following domain model. It references one particular BillingDetail object, which at runtime can be any concrete instance of that class.

In fact, because BillingDetail is abstract, the association must refer to an instance of one of its subclasses only—CreditCard or BankAccount—at runtime.

Implement Polymorphic Associations with EF Code First

We don’t have to do anything special to enable polymorphic associations in EF Code First; The user needs a unidirectional association to some BillingDetails, which can be CreditCard or BankAccount so we just create this association and it would be naturally polymorphic:

public class User
{
    public int UserId { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public int BillingDetailId { get; set; }
 
    public virtual BillingDetail BillingInfo { get; set; }
}

In other words, as you can see above, a polymorphic association is an association that may refer instances of a subclass of the class that was explicitly specified as the type of the navigation property (e.g. User.BillingInfo).

The following code demonstrates the creation of an association to an instance of the CreditCard subclass:

using (var context = new InheritanceMappingContext())
{
    CreditCard creditCard = new CreditCard()
    {                    
        Number   = "987654321",
        CardType = 1
    };                
    User user = new User()
    {
        UserId      = 1,    
        BillingInfo = creditCard
    }; 
    context.Users.Add(user);
    context.SaveChanges();
}

Now, if we navigate the association in a second context, EF Code First automatically retrieves the CreditCard instance:

using (var context = new InheritanceMappingContext())
{
    User user = context.Users.Find(1);
    Debug.Assert(user.BillingInfo is CreditCard);
}

Polymorphic Associations with TPT

Another important advantage of TPT is the ability to handle polymorphic associations. In the database a polymorphic association to a particular base class will be represented as a foreign key referencing the table of that particular base class. (e.g. Users table has a foreign key that references BillingDetails table.)

Generated SQL For Queries

Let’s take an example of a simple non-polymorphic query that returns a list of all the BankAccounts:

var query = from b in context.BillingDetails.OfType<BankAccount>() select b;

Executing this query (by invoking ToList() method) results in the following SQL statements being sent to the database (on the bottom, you can also see the result of executing the generated query in SQL Server Management Studio):

Now, let’s take an example of a very simple polymorphic query that requests all the BillingDetails which includes both BankAccount and CreditCard types:

var query = from b in context.BillingDetails select b;

This LINQ query seems even more simple than the previous one but the resulting SQL query is not as simple as you might expect:

As you can see, EF Code First relies on an INNER JOIN to detect the existence (or absence) of rows in the subclass tables CreditCards and BankAccounts so it can determine the concrete subclass for a particular row of the BillingDetails table. Also the SQL CASE statements that you see in the beginning of the query is just to ensure columns that are irrelevant for a particular row have NULL values in the returning flattened table. (e.g. BankName for a row that represents a CreditCard type)

TPT Considerations

Even though this mapping strategy is deceptively simple, the experience shows that performance can be unacceptable for complex class hierarchies because queries always require a join across many tables. In addition, this mapping strategy is more difficult to implement by hand— even ad-hoc reporting is more complex. This is an important consideration if you plan to use handwritten SQL in your application (For ad hoc reporting, database views provide a way to offset the complexity of the TPT strategy. A view may be used to transform the table-per-type model into the much simpler table-per-hierarchy model.)

Summary

In this post we learned about Table per Type as the second inheritance mapping in our series. So far, the strategies we’ve discussed require extra consideration with regard to the SQL schema (e.g. in TPT, foreign keys are needed). This situation changes with the Table per Concrete Type (TPC) that we will discuss in the next post.

References

Inheritance with EF Code First: Part 1 – Table per Hierarchy (TPH)

Friday, December 24, 2010

.NET C# Code First CTP5 Entity Framework

37 Comments

A simple strategy for mapping classes to database tables might be “one table for every entity persistent class.” This approach sounds simple enough and, indeed, works well until we encounter inheritance. Inheritance is such a visible structural mismatch between the object-oriented and relational worlds because object-oriented systems model both “is a” and “has a” relationships. SQL-based models provide only "has a" relationships between entities; SQL database management systems don’t support type inheritance—and even when it’s available, it’s usually proprietary or incomplete.

There are three different approaches to representing an inheritance hierarchy:

Table per Hierarchy (TPH): Enable polymorphism by denormalizing the SQL schema, and utilize a type discriminator column that holds type information.
Table per Type (TPT): Represent "is a" (inheritance) relationships as "has a" (foreign key) relationships.
Table per Concrete class (TPC): Discard polymorphism and inheritance relationships completely from the SQL schema.

I will explain each of these strategies in a series of posts and this one is dedicated to TPH. In this series we'll deeply dig into each of these strategies and will learn about "why" to choose them as well as "how" to implement them. Hopefully it will give you a better idea about which strategy to choose in a particular scenario.

Inheritance Mapping with Entity Framework Code First

All of the inheritance mapping strategies that we discuss in this series will be implemented by EF Code First CTP5. The CTP5 build of the new EF Code First library has been released by ADO.NET team earlier this month. EF Code-First enables a pretty powerful code-centric development workflow for working with data. I’m a big fan of the EF Code First approach, and I’m pretty excited about a lot of productivity and power that it brings. When it comes to inheritance mapping, not only Code First fully supports all the strategies but also gives you ultimate flexibility to work with domain models that involves inheritance. The fluent API for inheritance mapping in CTP5 has been improved a lot and now it's more intuitive and concise in compare to CTP4.

A Note For Those Who Follow Other Entity Framework Approaches

If you are following EF's "Database First" or "Model First" approaches, I still recommend to read this series since although the implementation is Code First specific but the explanations around each of the strategies is perfectly applied to all approaches be it Code First or others.

A Note For Those Who are New to Entity Framework and Code-First

If you choose to learn EF you've chosen well. If you choose to learn EF with Code First you've done even better. To get started, you can find a great walkthrough by Scott Guthrie here and another one by ADO.NET team here. In this post, I assume you already setup your machine to do Code First development and also that you are familiar with Code First fundamentals and basic concepts. You might also want to check out my other posts on EF Code First like Complex Types and Shared Primary Key Associations.

A Top Down Development Scenario

These posts take a top-down approach; it assumes that you’re starting with a domain model and trying to derive a new SQL schema. Therefore, we start with an existing domain model, implement it in C# and then let Code First create the database schema for us. However, the mapping strategies described are just as relevant if you’re working bottom up, starting with existing database tables. I’ll show some tricks along the way that help you dealing with nonperfect table layouts.

The Domain Model

In our domain model, we have a BillingDetail base class which is abstract (note the italic font on the UML class diagram below). We do allow various billing types and represent them as subclasses of BillingDetail class. As for now, we support CreditCard and BankAccount:

Implement the Object Model with Code First

As always, we start with the POCO classes. Note that in our DbContext, I only define one DbSet for the base class which is BillingDetail. Code First will find the other classes in the hierarchy based on Reachability Convention.

public abstract class BillingDetail 
{
    public int BillingDetailId { get; set; }
    public string Owner { get; set; }        
    public string Number { get; set; }
}
 
public class BankAccount : BillingDetail
{
    public string BankName { get; set; }
    public string Swift { get; set; }
}
 
public class CreditCard : BillingDetail
{
    public int CardType { get; set; }                
    public string ExpiryMonth { get; set; }
    public string ExpiryYear { get; set; }
}
 
public class InheritanceMappingContext : DbContext
{
    public DbSet<BillingDetail> BillingDetails { get; set; }
}

This object model is all that is needed to enable inheritance with Code First. If you put this in your application you would be able to immediately start working with the database and do CRUD operations. Before going into details about how EF Code First maps this object model to the database, we need to learn about one of the core concepts of inheritance mapping: polymorphic and non-polymorphic queries.

Polymorphic Queries

LINQ to Entities and EntitySQL, as object-oriented query languages, both support polymorphic queries—that is, queries for instances of a class and all instances of its subclasses, respectively. For example, consider the following query:

IQueryable<BillingDetail> linqQuery = from b in context.BillingDetails select b;
List<BillingDetail> billingDetails = linqQuery.ToList();

Or the same query in EntitySQL:

string eSqlQuery = @"SELECT VAlUE b FROM BillingDetails AS b";
ObjectContext objectContext = ((IObjectContextAdapter)context).ObjectContext;
ObjectQuery<BillingDetail> objectQuery = objectContext.CreateQuery<BillingDetail>(eSqlQuery);
List<BillingDetail> billingDetails = objectQuery.ToList();

linqQuery and eSqlQuery are both polymorphic and return a list of objects of the type BillingDetail, which is an abstract class but the actual concrete objects in the list are of the subtypes of BillingDetail: CreditCard and BankAccount.

Non-polymorphic Queries

All LINQ to Entities and EntitySQL queries are polymorphic which return not only instances of the specific entity class to which it refers, but all subclasses of that class as well. On the other hand, Non-polymorphic queries are queries whose polymorphism is restricted and only returns instances of a particular subclass. In LINQ to Entities, this can be specified by using OfType<T>() Method. For example, the following query returns only instances of BankAccount:

IQueryable<BankAccount> query = from b in context.BillingDetails.OfType<BankAccount>() 
                                select b;

EntitySQL has OFTYPE operator that does the same thing:

string eSqlQuery = @"SELECT VAlUE b FROM OFTYPE(BillingDetails, Model.BankAccount) AS b";

In fact, the above query with OFTYPE operator is a short form of the following query expression that uses TREAT and IS OF operators:

string eSqlQuery = @"SELECT VAlUE TREAT(b as Model.BankAccount) 
                     FROM BillingDetails AS b 
                     WHERE b IS OF(Model.BankAccount)";

(Note that in the above query, Model.BankAccount is the fully qualified name for BankAccount class. You need to change "Model" with your own namespace name.)

Table per Hierarchy (TPH)

An entire class hierarchy can be mapped to a single table. This table includes columns for all properties of all classes in the hierarchy. The concrete subclass represented by a particular row is identified by the value of a type discriminator column. You don’t have to do anything special in Code First to enable TPH. It's the default inheritance mapping strategy:

This mapping strategy is a winner in terms of both performance and simplicity. It’s the best-performing way to represent polymorphism—both polymorphic and nonpolymorphic queries perform well—and it’s even easy to implement by hand. Ad-hoc reporting is possible without complex joins or unions. Schema evolution is straightforward.

Discriminator Column

As you can see in the DB schema above, Code First has to add a special column to distinguish between persistent classes: the discriminator. This isn’t a property of the persistent class in our object model; it’s used internally by EF Code First. By default, the column name is "Discriminator", and its type is string. The values defaults to the persistent class names —in this case, “BankAccount” or “CreditCard”. EF Code First automatically sets and retrieves the discriminator values.

TPH Requires Properties in SubClasses to be Nullable in the Database

TPH has one major problem: Columns for properties declared by subclasses will be nullable in the database. For example, Code First created an (INT, NULL) column to map CardType property in CreditCard class. However, in a typical mapping scenario, Code First always creates an (INT, NOT NULL) column in the database for an int property in persistent class. But in this case, since BankAccount instance won’t have a CardType property, the CardType field must be NULL for that row so Code First creates an (INT, NULL) instead. If your subclasses each define several non-nullable properties, the loss of NOT NULL constraints may be a serious problem from the point of view of data integrity.

TPH Violates the Third Normal Form

Another important issue is normalization. We’ve created functional dependencies between nonkey columns, violating the third normal form. Basically, the value of Discriminator column determines the corresponding values of the columns that belong to the subclasses (e.g. BankName) but Discriminator is not part of the primary key for the table. As always, denormalization for performance can be misleading, because it sacrifices long-term stability, maintainability, and the integrity of data for immediate gains that may be also achieved by proper optimization of the SQL execution plans (in other words, ask your DBA).

Generated SQL Query

Let's take a look at the SQL statements that EF Code First sends to the database when we write queries in LINQ to Entities or EntitySQL. For example, the polymorphic query for BillingDetails that you saw, generates the following SQL statement:

SELECT 
[Extent1].[Discriminator] AS [Discriminator], 
[Extent1].[BillingDetailId] AS [BillingDetailId], 
[Extent1].[Owner] AS [Owner], 
[Extent1].[Number] AS [Number], 
[Extent1].[BankName] AS [BankName], 
[Extent1].[Swift] AS [Swift], 
[Extent1].[CardType] AS [CardType], 
[Extent1].[ExpiryMonth] AS [ExpiryMonth], 
[Extent1].[ExpiryYear] AS [ExpiryYear]
FROM [dbo].[BillingDetails] AS [Extent1]
WHERE [Extent1].[Discriminator] IN ('BankAccount','CreditCard')

Or the non-polymorphic query for the BankAccount subclass generates this SQL statement:

SELECT 
[Extent1].[BillingDetailId] AS [BillingDetailId], 
[Extent1].[Owner] AS [Owner], 
[Extent1].[Number] AS [Number], 
[Extent1].[BankName] AS [BankName], 
[Extent1].[Swift] AS [Swift]
FROM [dbo].[BillingDetails] AS [Extent1]
WHERE [Extent1].[Discriminator] = 'BankAccount'

Note how Code First adds a restriction on the discriminator column and also how it only selects those columns that belong to BankAccount entity.

Change Discriminator Column Data Type and Values With Fluent API

Sometimes, especially in legacy schemas, you need to override the conventions for the discriminator column so that Code First can work with the schema. The following fluent API code will change the discriminator column name to "BillingDetailType" and the values to "BA" and "CC" for BankAccount and CreditCard respectively:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    modelBuilder.Entity<BillingDetail>()
                .Map<BankAccount>(m => m.Requires("BillingDetailType").HasValue("BA"))
                .Map<CreditCard>(m => m.Requires("BillingDetailType").HasValue("CC"));
}

Also, changing the data type of discriminator column is interesting. In the above code, we passed strings to HasValue method but this method has been defined to accepts a type of object:

public void HasValue(object value);

Therefore, if for example we pass a value of type int to it then Code First not only use our desired values (i.e. 1 & 2) in the discriminator column but also changes the column type to be (INT, NOT NULL):

modelBuilder.Entity<BillingDetail>()
            .Map<BankAccount>(m => m.Requires("BillingDetailType").HasValue(1))
            .Map<CreditCard>(m => m.Requires("BillingDetailType").HasValue(2));

Summary

In this post we learned about Table per Hierarchy as the default mapping strategy in Code First. The disadvantages of the TPH strategy may be too serious for your design—after all, denormalized schemas can become a major burden in the long run. Your DBA may not like it at all. In the next post, we will learn about Table per Type (TPT) strategy that doesn’t expose you to this problem.

References