Archives

Archives / 2011 / April
  • Associations in EF Code First: Part 4 – Table Splitting

    This is the fourth post in a series that explains entity association mappings with EF Code First. This series includes:
    In the second part of this series we saw how to map a special kind of one-to-one association—a composition with complex types. We argued that this is usually the simplest way to represent one-to-one relationships which comes with some limitations. We addressed the first limitation (shared references) by introducing shared primary key associations in the previous blog post. In today’s blog post we are going to address the third limitation of the complex types by learning about Table Splitting as yet another way to map a one-to-one association.

    The Motivation Behind this Mapping: A Complex Type That Can be Lazy Loaded

    A shared primary key association does not expose us to the third limitation of the complex types regarding Lazy Loading, we can of course lazy/defer load the Address information of a given user but at the same time, it does not give us the same SQL schema as the complex type mapping. After all, it adds a new table for the Address entity to the schema while mapping the Address with a complex type stores the address information in the Users table. So the question still remains there: How can we keep everything (e.g. User and Address) in one single table yet be able to lazy load the complex type part (Address) after reading the principal entity (User)? In other words, how can we have lazy loading with a complex type?

    Splitting a Single Table into Multiple Entities

    Table splitting (a.k.a. horizontal splitting) enables us to map a single table to multiple entities. This is particularly useful for scenarios that we have a table with many columns where some of those columns might not be needed as frequently as others or some of the columns are expensive to load (e.g. a column with a binary data type).

    An Example From the Northwind Database

    Unlike the other parts of this series, where we start with an object model and then derive a SQL schema afterwards, in this post we are going to do the reverse, for a reason that you'll see, we will start with an existing schema and will try to create an object model that matches the schema. For that we are going to use the Employees table from the Northwind database. You can download and install Northwind database from here If you don't have it already installed on your SQL Server. The following shows the Employees table from the Northwind database that we are going to use:
    As you can see, this table has a Photo column of image type which makes it a good candidate to be lazy loaded each time we read an Employee from this table.

    The Object Model

    As the following object model shows, I created two entities: Employee and EmployeePhoto. I also created a unidirectional association between these two by defining a navigation property on the Employee class called EmployeePhoto:
    public class Employee
    {
        public int EmployeeID { getset; }
        public string LastName { getset; }
        public string FirstName { getset; }
        public string Title { getset; }
        public string TitleOfCourtesy { getset; }
        public DateTime? BirthDate { getset; }
        public DateTime? HireDate { getset; }
        public string Address { getset; }
        public string City { getset; }
        public string Region { getset; }
        public string PostalCode { getset; }
        public string Country { getset; }
        public string HomePhone { getset; }
        public string Extension { getset; }        
        public string Notes { getset; }
        public int? ReportsTo { getset; }        
     
        public virtual EmployeePhoto EmployeePhoto { getset; }
    }
     
    public class EmployeePhoto
    {
        [Key]
        public int EmployeeID { getset; }
        public byte[] Photo { getset; }
        public string PhotoPath { getset; }
    }
     
    public class NorthwindContext : DbContext
    {        
        public DbSet<Employee> Employees { getset; }
        public DbSet<EmployeePhoto> EmployeePhoto { getset; }     
    }

    How to Create a Table Splitting with Fluent API?

    As also mentioned in the previous post, by convention, Code First always takes a unidirectional association as one-to-many unless we specify otherwise with fluent API. However, the fluent API codes that we have seen so far in this series won't let us create a table splitting. If we mark EmployeePhoto class as a complex type, we wouldn't be able to lazy load it anymore or if we create a shared primary key association then it will look for a separate table for the EmployeePhoto entity which we don't have in the Northwind database. The trick is to create a shared primary key association between Employee and EmployeePhoto entities but then instruct Code First to map them both to the same table. The following code shows how:
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Employee>()
                    .HasRequired(e => e.EmployeePhoto)
                    .WithRequiredPrincipal();
            
        modelBuilder.Entity<Employee>().ToTable("Employees");
        modelBuilder.Entity<EmployeePhoto>().ToTable("Employees");
    }
    Note how we made both ends of the association required by using HasRequired and WithRequiredPrincipal methods, even though both the Photo and PhotoPath columns has been defined to allow NULLs.

    See the Lazy Loading of the Dependent Entity in Action

    Now it's time to write a test to make sure that EF does not select the Photo column each time we query for an employee:
    using (var context = new NorthwindContext())
    {
        Employee employee = context.Employees.First();
        byte[] photo = employee.EmployeePhoto.Photo;
    }
    The following screen shot from the SQL Profiler shows the query that has been submitted to SQL Server as the result of reading the first employee object:
    Accessing the EmployeePhoto navigation property of the employee object on the next line causes EF to submit a second query to the SQL Server to lazy (implicit) load the EmployeePhoto (By default, EF fetches associated objects and collections lazily whenever you access them):

    Where to Use this Mapping?

    I recommend using Table Splitting only for mapping of the legacy databases, actually that's the reason we start this post from an existing database like Northwind. For green-field development scenarios consider using shared primary key association instead. There are several reasons why you may want to split the Employee table to two tables when designing a new physical data model for your application. In fact, it is very common for most applications to require a core collection of data attributes of any given entity, and then a specific subset of the noncore data attributes. For example, the core columns of the Employee table would include the columns required to store their name, address, and phone numbers; whereas noncore columns would include the Photo column. Because Employee.Photo is large, and required only by a few applications, you would want to consider splitting it off into its own table. This would help to improve retrieval access times for applications that select all columns from the Employee table yet do not require the photo. This also works pretty well for EF since it doesn't support lazy loading at the scalar property or complex type level.

    Summary

    In this post we learned about mapping a one-to-one association with table splitting. It enabled us to have lazy loading for the EmployeePhoto entity, something that we would have missed, had we mapped it with a complex type. We saw that on the database side it looks like a complex type mapping but on the object model it is not a complex type since we mapped EmployeePhoto as an Entity with an object identifier (EmployeeID). In fact, it's a special kind of a shared primary key association where both the principal and dependent entities are mapped to one single table. This somehow exotic one-to-one association mapping should be reserved only for the mapping of existing legacy databases.


  • Associations in EF Code First: Part 3 – Shared Primary Key Associations

    This is the third post in a series that explains entity association mappings with EF Code First. This series includes:
    In the previous blog post I demonstrated how to map a special kind of one-to-one association—a composition with complex types. We argued that the relationship between User and Address is best represented with a complex type mapping and we saw that this is usually the simplest way to represent one-to-one relationships but comes with some limitations.

    In today’s blog post I’m going to discuss how we can address those limitations by changing our mapping strategy. This is particularly useful for scenarios that we want a dedicated table for Address, so that we can map both User and Address as entities. One benefit of this model is the possibility for shared references— another entity class (let’s say Shipment) can also have a reference to a particular Address instance. If a User has a reference to this instance, as her BillingAddress, the Address instance has to support shared references and needs its own identity. In this case, User and Address classes have a true one-to-one association.

    Introducing the Revised Model

    In this revised version, each User could have one BillingAddress (Billing Association). Also a Shipment always needs a destination address for delivery (Delivery Association). The following shows the class diagram for this domain model (note the multiplicities on association lines):
    In this model we assumed that the billing address of the user is the same as her delivery address. Now let’s create the association mappings for this domain model. There are several choices, the first being a One-to-One Primary Key Association.

    Shared Primary Key Associations

    Also know as One-to-One Primary Key Associations, means two related tables share the same primary key values. The primary key of one table is also a foreign key of the other. Let’s see how we can create a primary key association mapping with Code First.

    How to Implement a One-to-One Primary Key Association with Code First

    First, we start with the POCO classes. As you can see, we've defined BillingAddress as a navigation property on User class and another one on Shipment class named DeliveryAddress. Both associations are unidirectional since we didn't define related navigation properties on Address class as for User and Shipment.
    public class User
    {
        public int UserId { getset; }
        public string Name { getset; }
     
        public virtual Address BillingAddress { getset; }
    }
     
    public class Address
    {
        public int AddressId { getset; }
        public string Street { getset; }
        public string City { getset; }
        public string ZipCode { getset; }
    }
            
    public class Shipment
    {
        public int ShipmentId { getset; }     
        public string State { getset; }
     
        public virtual Address DeliveryAddress { getset; }
    }
     
    public class Context : DbContext
    {
        public DbSet<User> Users { getset; }
        public DbSet<Address> Addresses { getset; }
        public DbSet<Shipment> Shipments { getset; }
    }

    How Code First Sees the Associations in our Object Model: One-to-Many

    Code First reads the model and tries to figure out the multiplicity of the associations. Since the associations are unidirectional, Code First takes this as if one Address has many Users and Many Shipments and will create a one-to-many association for each of them. In other words, a unidirectional association is always inferred as One-to-Many by Code First. So, what we were hoping for —a one-to-one association, is not inline with the Code First conventions.

    How to Change the Multiplicity of the Associations to One-to-One by Using the Conventions

    Obviously, one way to turn our associations to one-to-one is by making them bidirectional. That is, adding a new navigation property to Address class of type User and another one of type Shipment. By doing that we simply signal Code First that we are looking to have one-to-one associations since for example User has an Address and also Address has a User. Therefore, Code First will change the multiplicity to one-to-one and this will solve the problem.

    Should We Make the Associations Bidirectional?

    As always, the decision is up to us and depends on whether we need to navigate through our objects in that direction in the application code. In this case, we’d probably conclude that the bidirectional association doesn’t make much sense. If we call anAddress.User, we are saying “give me the user who has this address”, not a very reasonable request. So this is not a good option. Instead we'll keep our object model as it is and will explicitly ask Code First to make our associations one-to-one.

    How to Change the Multiplicity to One-to-One with Fluent API

    The following code is all that is needed to make the associations to be one-to-one. Note how the multiplicities in the UML class diagram (e.g. 1 on User and 0..1 on address) has been translated to the fluent API code by using HasRequired and HasOptional methods:
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Entity<User>().HasOptional(u => u.BillingAddress)
                                   .WithRequired();
        
        modelBuilder.Entity<Shipment>().HasRequired(u => u.DeliveryAddress)
                                       .WithOptional();
    }
    Also it worth noting that when we are mapping a one-to-one association with fluent API, we don't need to specify the foreign key as we would do when mapping a one-to-many association with HasForeignKey method. Since EF only supports one-to-one associations on primary keys, it will automatically create the relationship in the database on the primary keys.

    Database Schema

    The mapping result for our object model is as follows (note the Identity column on Users table):

    Referential Integrity

    In relational database design the referential integrity rule states that each non-null value of a foreign key must match the value of some primary key. But wait, how does it even applies here? All we have is just three primary keys referencing each other! Who is the primary key and who is the foreign key? The best way to find the answer of this question is to take a look at the properties of the relationships in the database that has been created by Code First:
    As you can see, Code First adds a foreign key constraint which links the primary key of the Addresses table to the primary key of the Users table and adds another foreign key constraint that links the primary key of the Shipments table to the primary key of the Addresses table. The foreign key constraint means that a user has to exist for a particular address but not the other way around. In other words, the database guarantees that an Addresses row’s primary key references a valid Users primary key and a Shipments row’s primary key references a valid Addresses primary key.

    How Code First Determines the Principal and Dependent Ends in an Association?

    Code First has rules to determine the principal and dependent ends of an association. For one-to-many relationships the many end is always the dependent, but it gets a little tricky in one-to-one associations. In one-to-one associations Code First decides based on our object model, and possible data annotations or fluent API code that we may have. For example in this case, we used the following fluent API code to configure the User-Address association:
    modelBuilder.Entity<User>().HasOptional(u => u.BillingAddress).WithRequired();
    This reads as "User entity has an optional association with one Address object but this association is required for Address entity". For Code First this is good enough to make the decision: It marked User as the principal end and Address as the dependent end in the association. Since we have the same fluent API code for the second association between Address and Shipment, it marks Address as the principal end and Shipment as the dependent end in this association as well.

    This decision has some consequences. In fact, the referential integrity that we saw, is the first result of this Code First's principal/dependent decision.

    Second Result of Code First's Principal/Dependent Decision: Database Identity

    If you take a closer look at the above DB schema, you'll notice that only UserId has a regular identifier generator (aka Identity or Sequence) and AddressId and ShipmentId does not. This is a very important consequence of the principal/dependent decision for one-to-one associations: the dependent primary key will become non-Identity by default. This make sense because they share their primary key values and only one of them can be auto generated and we need to take care of providing valid keys for the rest.

    What about Cascade Deletes?

    As we saw, each Address always belongs to one User and each Shipment always delivered to one single Address. We want to make sure that when we delete a User the possible dependent rows on Address and Shipment also get deleted in the database. In fact, this is one of the Referential Integrity Refactorings which called Introduce Cascading Delete. The primary reason we would apply "Introduce Cascading Delete" is to preserve the referential integrity of our data by ensuring that related rows are appropriately deleted when a parent row is deleted. By default, Code First does not enable cascade delete when it creates a one-to-one relationship in the database. As always we can override this convention by fluent API:
    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Entity<User>().HasOptional(u => u.BillingAddress)
                                   .WithRequired()
                                   .WillCascadeOnDelete();
        
        modelBuilder.Entity<Shipment>().HasRequired(u => u.DeliveryAddress)
                                       .WithOptional()
                                       .WillCascadeOnDelete();
    }

    What the Additional Methods Like WithRequiredDependent are for?

    The HasRequired method returns an object of type RequiredNavigationPropertyConfiguration which defines two special methods called WithRequiredDependent and WithRequiredPrincipal in addition to the typical WithMany and WithOptional methods that we usually use. We saw that the only reason Code First could figure out principal and dependent in our associations was because our fluent API code clearly specified one end as Required and the other as Optional. But what if both endpoints are required or both are optional in the association? For example consider a scenario that a User always has one Address and Address always has one User (required on both end). Now Code First cannot pick up the principal and dependent ends on its own and that's exactly where methods like WithRequiredDependent come into play. In other words, this scenario ultimately need to be configured by fluent API and fluent API is designed in a way that will force you to explicitly specify who is dependent and who is principal in a required-required or optional-optional association scenario.

    For example, this fluent API code shows how we can configure the User-Address association where both ends are required:
    modelBuilder.Entity<User>().HasRequired(u => u.BillingAddress).WithRequiredDependent();
    
    Taking a closer look at the RequiredNavigationPropertyConfiguration type also shows the idea:
    public class RequiredNavigationPropertyConfiguration<TEntityType, TTargetEntityType>
    {
        public DependentNavigationPropertyConfiguration<TEntityType, TTargetEntityType> WithMany();
        public CascadableNavigationPropertyConfiguration WithOptional();
        public CascadableNavigationPropertyConfiguration WithRequiredDependent();
        public CascadableNavigationPropertyConfiguration WithRequiredPrincipal();
    }
    As you can see, if you want to go another Required after HasRequired method, you have to either call WithRequiredDependent or WithRequiredPrincipal since there is no WithRequired method defined on RequiredNavigationPropertyConfiguration class.

    Working with the Model

    Here is an example for adding a new user along with its billing address. EF is smart enough to use the newly generated UserId for the AddressId as well:
    using (var context = new Context())
    {
        Address billingAddress = new Address()
        {
            Street = "Main St.",
            City   = "Seattle"
        };
                    
        User user = new User()
        {
            Name = "Morteza",                    
            BillingAddress = billingAddress
        };
     
        context.Users.Add(user);
        context.SaveChanges();
    }
    The following code is an example of adding a new Address and Shipment for an existing User (assuming that we have a User with UserId = 1 in the database):
    using (var context = new Context())
    {
        Address deliveryAddress = new Address()
        {
            AddressId = 1,
            Street = "Main St.",                    
        };
     
        Shipment shipment = new Shipment()
        {
            ShipmentId = 1,
            State = "Shipped",                    
            DeliveryAddress = deliveryAddress
        };
     
        context.Shipments.Add(shipment);
        context.SaveChanges();
    }

    Limitations of This Mapping

    There are two important limitations to associations mapped as shared primary key:
    • Difficulty in Saving Related Objects
    • The main difficulty with this approach is ensuring that associated instances are assigned the same primary key value when the objects are saved. For example, when adding a new Address object, it's our responsibility to provide a unique AddressId that is also valid (a User can be found with such a value as UserId.)
    • Multiple Addresses for User is Not Possible
    • With this mapping we cannot have more than one Address for User. At the beginning of this post, when we introduce our model, we assumed that the user has the same address for billing and delivery. But what if that's not the case? What if we also want to add a Home address to User for the deliveries? In the current setup, each row in the User table has at most one corresponding row in the Address table. Two addresses would require an additional address table, and this mapping style therefore wouldn’t be adequate.

    Summary

    In this post we learned about one-to-one associations which shared primary key is just one way to implement it. Shared primary key associations aren’t uncommon but are relatively rare. In many schemas, a one-to-one association is represented with a foreign key field and a unique constraint. In the next posts we will revisit the same domain model and will learn about other ways to map one-to-one associations that does not have the limitations of the shared primary key association mapping.

    References