Archives

Archives / 2011 / January
  • Associations in EF Code First CTP5: Part 3 – One-to-One Foreign Key Associations

    This is the third post in a series that explains entity association mappings with EF Code First. I've described these association types so far: In the previous blog post we saw the limitations of shared primary key association and argued that this type of association is relatively rare and in many schemas, a one-to-one association is represented with a foreign key field and a unique constraint. Today we are going to discuss how this is done by learning about one-to-one foreign key associations.

    Introducing the Revised Model

    In this revised version, each User always have two addresses: one billing address and another one for delivery. The following figure shows the class diagram for this domain model:

    One-to-One Foreign Key Association

    Instead of sharing a primary key, two rows can have a foreign key relationship. One table has a foreign key column that references the primary key of the associated table (The source and target of this foreign key constraint can even be the same table: This is called a self-referencing relationship.). An additional constraint enforces this relationship as a real one to one. For example, by making the BillingAddressId column unique, we declare that a particular address can be referenced by at most one user, as a billing address. This isn’t as strong as the guarantee from a shared primary key association, which allows a particular address to be referenced by at most one user, period. With several foreign key columns (which is the case in our domain model since we also have a foreign key for DeliveryAddress), we can reference the same address target row several times. But in any case, two users can’t share the same address for the same purpose.

    The Object Model

    Let's start by creating an object model for our domain:
    public class User
    {
        public int UserId { getset; }
        public string FirstName { getset; }
        public string LastName { getset; }        
        public int BillingAddressId { getset; }
        public int DeliveryAddressId { getset; }
     
        public Address BillingAddress { getset; }
        public Address DeliveryAddress { getset; }
    }
     
    public class Address
    {
        public int AddressId { getset; }
        public string Street { getset; }
        public string City { getset; }
        public string PostalCode { getset; }   
    }
     
    public class EntityMappingContext : DbContext
    {
        public DbSet<User> Users { getset; }
        public DbSet<Address> Addresses { getset; }
    }
    As you can see, User class has introduced two new scalar properties as BillingAddressId and DeliveryAddressId as well as their related navigation properties (BillingAddress and DeliveryAddress).

    Configuring Foreign Keys With Fluent API

    BillingAddressId and DeliveryAddressId are foreign key scalar properties and representing the actual foreign key values that the relationships are established on. However, Code First will not recognize them as the foreign keys for the associations since their names are not aligned with the conventions that it has to infer foreign keys. Therefore, we need to use fluent API (or Data Annotations) to tell Code First about the foreign keys. Here is the fluent API code to identify the foreign key properties:
    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<User>()
                    .HasRequired(a => a.BillingAddress)
                    .WithMany()
                    .HasForeignKey(u => u.BillingAddressId);
     
        modelBuilder.Entity<User>()
                    .HasRequired(a => a.DeliveryAddress)
                    .WithMany()
                    .HasForeignKey(u => u.DeliveryAddressId);
    }
    Alternatively, we can use Data Annotations to achieve this. CTP5 introduced a new attribute in System.ComponentModel.DataAnnotations namespace which is called ForeignKeyAttribute and we can place it on a navigation property to specify the property that represents the foreign key of the relationship:
    public class User
    {
        public int UserId { getset; }
        public string FirstName { getset; }
        public string LastName { getset; }        
        public int BillingAddressId { getset; }
        public int DeliveryAddressId { getset; }       
        
        [ForeignKey("BillingAddressId")]
        public Address BillingAddress { getset; }
        
        [ForeignKey("DeliveryAddressId")]
        public Address DeliveryAddress { getset; }
    }
    However, we will not use this Data Annotation and will stick with our fluent API code for a reason that you'll see soon.

    Creating a SQL Server Schema

    The object model seems to be ready to give us the desired SQL schema, however, if we try to create a SQL Server database from it, we will get an InvalidOperationException with this message:
    "The database creation succeeded, but the creation of the database objects did not. See InnerException for details."
    The inner exception is a System.Data.SqlClient.SqlException containing this message:
    "Introducing FOREIGN KEY constraint 'User_DeliveryAddress' on table 'Users' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints. Could not create constraint. See previous errors."
    As you can tell from the type of the inner exception (SqlException), it has nothing to do with EF or Code First; it has been generated purely by SQL Server when Code First was trying to create a database based on our object model.

    SQL Server and Multiple Cascade Paths

    A Multiple cascade path happens when a cascade path goes from column col1 in table A to table B and also from column col2 in table A to table B. So it seems that Code First tried to turn on Cascade Delete for both BillingAddressId and DeliveryAddressId columns in Users table. In fact, Code First was trying to use Declarative Referential Integrity (DRI) to enforce cascade deletes and the problem is that SQL Server is not fully ANSI SQL-92 compliant when it comes to the cascading actions. In SQL Server, DRI forbids cascading updates or deletes in a multiple cascade path scenario.

    A KB article also explains why we received this error: "In SQL Server, a table cannot appear more than one time in a list of all the cascading referential actions that are started by either a DELETE or an UPDATE statement. For example, the tree of cascading referential actions must only have one path to a particular table on the cascading referential actions tree". (i.e. the User table appeared twice in a list of cascading referential actions started by a DELETE). Basically, SQL Server does simple counting of cascade paths and, rather than trying to work out whether any cycles actually exist, it assumes the worst and refuses to create the referential actions (Cascades).

    Therefore, depend on our database engine, we may or may not get this exception (For example, both Oracle and MySQL let us create Cascades in this scenario.).

    Overriding Code First Convention To Resolve the Problem

    As you saw, Code First automatically turns on Cascade Deletes on required one-to-many associations based on the conventions. However, in order to resolve the exception that we got from SQL Server, we have no choice other than overriding this convention and switching cascade deletes off on at least one of the associations and as of CTP5, the only way to accomplish this is by using fluent API. Let's switch it off on DeliveryAddress Association:
    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<User>()
                    .HasRequired(a => a.BillingAddress)
                    .WithMany()
                    .HasForeignKey(u => u.BillingAddressId);
     
        modelBuilder.Entity<User>()
                    .HasRequired(a => a.DeliveryAddress)
                    .WithMany()
                    .HasForeignKey(u => u.DeliveryAddressId).WillCascadeOnDelete(false);
    }

    One-to-One Foreign Key Associations in EF Code First

    As you may have noticed, both associations in the fluent API code has been configured as a many-to-one—not one-to-one, as you might have expected. The reason is simple: Code First (and EF in general) does not natively support one-to-one foreign key associations. In fact, EF does not support any association scenario that involves unique constraints at all. Fortunately, in this case we don’t care what’s on the target side of the association, so we can treat it like a to-one association without the many part. All we want is to express “This entity (User) has a property that is a reference to an instance of another entity (Address)” and use a foreign key field to represent that relationship. Basically EF still thinks that the relationship is many-to-one. This is a workaround for the current EF limitation which comes with two consequences: First, EF won't create any additional constraint for us to enforces this relationship as a one to one, we need to manually create it ourselves. The second limitation that this lack of support impose to us is more important: one to one foreign key associations cannot be bidirectional (i.e. we cannot define a User property on the Address class).

    Create a Unique Constraint To Enforce the Relationship as a Real One to One

    We can manually create unique constraints on the foreign keys in the database after Code First creates it for us but if you are like me and prefer to create your database in one shot then there is a way in CTP5 to have Code First create the constraints as part of its database creation process. For that we can take advantage of the new CTP5’s SqlCommand method on DbDatabase class which allows raw SQL commands to be executed against the database. The best place to invoke SqlCommand method for this purpose is inside a Seed method that has been overridden in a custom Initializer class:
    protected override void Seed(EntityMappingContext context)
    {
        context.Database.SqlCommand("ALTER TABLE Users ADD CONSTRAINT uc_Billing UNIQUE(BillingAddressId)");
        context.Database.SqlCommand("ALTER TABLE Users ADD CONSTRAINT uc_Delivery UNIQUE(DeliveryAddressId)");
    }
    This code adds unique constraints to the BillingAddressId and DeliveryAddressId columns in the DDL generated by Code First.

    SQL Schema

    The object model is ready now and Code First will create the following database schema for us:
    It is worth mentioning that we can still enforce cascade deletes for DeliveryAddress relationship. SQL Server allows enforcing Referential Integrity in two different ways. DRI that we just saw is the most basic yet least flexible way. The other way is to use Triggers. We can write a Delete Triggers on the primary table that either deletes the rows in the dependent table(s) or sets all corresponding foreign keys to NULL (In our case the foreign keys are Non-Nullable so it has to delete the dependent rows).

    Download

    Click here to download and run the one-to-one foreign key association sample that we have built in this blog post.

    Summary

    In this blog post we learned about one-to-one foreign key associations as a better way to represent one to one relationships. However, we saw some limitations such as the need for manual creation of unique constraints and also the fact that these type of associations cannot be bidirectional, all due to the lack of unique constraint support in EF. Support for unique constraints is going to require changes to the whole EF stack and it won't happen in the RTM targeted for this year as that RTM will be layered on top of the current .NET 4.0 functionality. That said, EF team has this feature on their list for the future, so hopefully it will be supported in a later release of EF and until then the workaround that I showed here is going to be the way to implement one-to-one foreign key associations in EF Code First.

    References

  • Inheritance with EF Code First: Part 3 – Table per Concrete Type (TPC)

    This is the third (and last) post in a series that explains different approaches to map an inheritance hierarchy with EF Code First. I've described these strategies in previous posts: In today’s blog post I am going to discuss Table per Concrete Type (TPC) which completes the inheritance mapping strategies supported by EF Code First. At the end of this post I will provide some guidelines to choose an inheritance strategy mainly based on what we've learned in this series.

    TPC and Entity Framework in the Past

    Table per Concrete type is somehow the simplest approach suggested, yet using TPC with EF is one of those concepts that has not been covered very well so far and I've seen in some resources that it was even discouraged. The reason for that is just because Entity Data Model Designer in VS2010 doesn't support TPC (even though the EF runtime does). That basically means if you are following EF's Database-First or Model-First approaches then configuring TPC requires manually writing XML in the EDMX file which is not considered to be a fun practice. Well, no more. You'll see that with Code First, creating TPC is perfectly possible with fluent API just like other strategies and you don't need to avoid TPC due to the lack of designer support as you would probably do in other EF approaches.

    Table per Concrete Type (TPC)

    In Table per Concrete type (aka Table per Concrete class) we use exactly one table for each (nonabstract) class. All properties of a class, including inherited properties, can be mapped to columns of this table, as shown in the following figure:
    As you can see, the SQL schema is not aware of the inheritance; effectively, we’ve mapped two unrelated tables to a more expressive class structure. If the base class was concrete, then an additional table would be needed to hold instances of that class. I have to emphasize that there is no relationship between the database tables, except for the fact that they share some similar columns.

    TPC Implementation in Code First

    Just like the TPT implementation, we need to specify a separate table for each of the subclasses. We also need to tell Code First that we want all of the inherited properties to be mapped as part of this table. In CTP5, there is a new helper method on EntityMappingConfiguration class called MapInheritedProperties that exactly does this for us. Here is the complete object model as well as the fluent API to create a TPC mapping:
    public abstract class BillingDetail
    {
        public int BillingDetailId { getset; }
        public string Owner { getset; }
        public string Number { getset; }
    }
            
    public class BankAccount : BillingDetail
    {
        public string BankName { getset; }
        public string Swift { getset; }
    }
            
    public class CreditCard : BillingDetail
    {
        public int CardType { getset; }
        public string ExpiryMonth { getset; }
        public string ExpiryYear { getset; }
    }
        
    public class InheritanceMappingContext : DbContext
    {
        public DbSet<BillingDetail> BillingDetails { getset; }
            
        protected override void OnModelCreating(DbModelBuilder modelBuilder)
        {
            modelBuilder.Entity<BankAccount>().Map(m =>
            {
                m.MapInheritedProperties();
                m.ToTable("BankAccounts");
            });
     
            modelBuilder.Entity<CreditCard>().Map(m =>
            {
                m.MapInheritedProperties();
                m.ToTable("CreditCards");
            });            
        }
    }
    The Importance of EntityMappingConfiguration Class
    As a side note, it worth mentioning that EntityMappingConfiguration class turns out to be a key type for inheritance mapping in Code First. Here is an snapshot of this class:
    namespace System.Data.Entity.ModelConfiguration.Configuration.Mapping
    {
        public class EntityMappingConfiguration<TEntityType> where TEntityType : class
        {
            public ValueConditionConfiguration Requires(string discriminator);
            public void ToTable(string tableName);
            public void MapInheritedProperties();
        }
    }
    As you have seen so far, we used its Requires method to customize TPH. We also used its ToTable method to create a TPT and now we are using its MapInheritedProperties along with ToTable method to create our TPC mapping.

    TPC Configuration is Not Done Yet!

    We are not quite done with our TPC configuration and there is more into this story even though the fluent API we saw perfectly created a TPC mapping for us in the database. To see why, let's start working with our object model. For example, the following code creates two new objects of BankAccount and CreditCard types and tries to add them to the database:
    using (var context = new InheritanceMappingContext())
    {
        BankAccount bankAccount = new BankAccount();
        CreditCard creditCard = new CreditCard() { CardType = 1 };
                    
        context.BillingDetails.Add(bankAccount);
        context.BillingDetails.Add(creditCard);
     
        context.SaveChanges();
    }
    Running this code throws an InvalidOperationException with this message:
    The changes to the database were committed successfully, but an error occurred while updating the object context. The ObjectContext might be in an inconsistent state. Inner exception message: AcceptChanges cannot continue because the object's key values conflict with another object in the ObjectStateManager. Make sure that the key values are unique before calling AcceptChanges.
    The reason we got this exception is because DbContext.SaveChanges() internally invokes SaveChanges method of its internal ObjectContext. ObjectContext's SaveChanges method on its turn by default calls AcceptAllChanges after it has performed the database modifications. AcceptAllChanges method merely iterates over all entries in ObjectStateManager and invokes AcceptChanges on each of them. Since the entities are in Added state, AcceptChanges method replaces their temporary EntityKey with a regular EntityKey based on the primary key values (i.e. BillingDetailId) that come back from the database and that's where the problem occurs since both the entities have been assigned the same value for their primary key by the database (i.e. on both BillingDetailId = 1) and the problem is that ObjectStateManager cannot track objects of the same type (i.e. BillingDetail) with the same EntityKey value hence it throws. If you take a closer look at the TPC's SQL schema above, you'll see why the database generated the same values for the primary keys: the BillingDetailId column in both BankAccounts and CreditCards table has been marked as identity.

    How to Solve The Identity Problem in TPC

    As you saw, using SQL Server’s int identity columns doesn't work very well together with TPC since there will be duplicate entity keys when inserting in subclasses tables with all having the same identity seed. Therefore, to solve this, either a spread seed (where each table has its own initial seed value) will be needed, or a mechanism other than SQL Server’s int identity should be used. Some other RDBMSes have other mechanisms allowing a sequence (identity) to be shared by multiple tables, and something similar can be achieved with GUID keys in SQL Server. While using GUID keys, or int identity keys with different starting seeds will solve the problem but yet another solution would be to completely switch off identity on the primary key property. As a result, we need to take the responsibility of providing unique keys when inserting records to the database. We will go with this solution since it works regardless of which database engine is used.

    Switching Off Identity in Code First

    We can switch off identity simply by placing DatabaseGenerated attribute on the primary key property and pass DatabaseGenerationOption.None to its constructor. DatabaseGenerated attribute is a new data annotation which has been added to System.ComponentModel.DataAnnotations namespace in CTP5:
    public abstract class BillingDetail
    {
        [DatabaseGenerated(DatabaseGenerationOption.None)]
        public int BillingDetailId { getset; }
        public string Owner { getset; }
        public string Number { getset; }
    }
    As always, we can achieve the same result by using fluent API, if you prefer that:
    modelBuilder.Entity<BillingDetail>()
                .Property(p => p.BillingDetailId)
                .HasDatabaseGenerationOption(DatabaseGenerationOption.None);

    Working With The Object Model

    Our TPC mapping is ready and we can try adding new records to the database. But, like I said, now we need to take care of providing unique keys when creating new objects:
    using (var context = new InheritanceMappingContext())
    {
        BankAccount bankAccount = new BankAccount() 
        { 
            BillingDetailId = 1                     
        };
        CreditCard creditCard = new CreditCard() 
        { 
            BillingDetailId = 2,
            CardType = 1
        };
                    
        context.BillingDetails.Add(bankAccount);
        context.BillingDetails.Add(creditCard);
     
        context.SaveChanges();
    }

    Polymorphic Associations with TPC is Problematic

    The main problem with this approach is that it doesn’t support Polymorphic Associations very well. After all, in the database, associations are represented as foreign key relationships and in TPC, the subclasses are all mapped to different tables so a polymorphic association to their base class (abstract BillingDetail in our example) cannot be represented as a simple foreign key relationship. For example, consider the domain model we introduced here where User has a polymorphic association with BillingDetail. This would be problematic in our TPC Schema, because if User has a many-to-one relationship with BillingDetail, the Users table would need a single foreign key column, which would have to refer both concrete subclass tables. This isn’t possible with regular foreign key constraints.

    Schema Evolution with TPC is Complex

    A further conceptual problem with this mapping strategy is that several different columns, of different tables, share exactly the same semantics. This makes schema evolution more complex. For example, a change to a base class property results in changes to multiple columns. It also makes it much more difficult to implement database integrity constraints that apply to all subclasses.

    Generated SQL

    Let's examine SQL output for polymorphic queries in TPC mapping. For example, consider this polymorphic query for all BillingDetails and the resulting SQL statements that being executed in the database:
    var query = from b in context.BillingDetails select b;
    Just like the SQL query generated by TPT mapping, the CASE statements that you see in the beginning of the query is merely to ensure columns that are irrelevant for a particular row have NULL values in the returning flattened table. (e.g. BankName for a row that represents a CreditCard type).

    TPC's SQL Queries are Union Based

    As you can see in the above screenshot, the first SELECT uses a FROM-clause subquery (which is selected with a red rectangle) to retrieve all instances of BillingDetails from all concrete class tables. The tables are combined with a UNION operator, and a literal (in this case, 0 and 1) is inserted into the intermediate result; (look at the lines highlighted in yellow.) EF reads this to instantiate the correct class given the data from a particular row. A union requires that the queries that are combined, project over the same columns; hence, EF has to pad and fill up nonexistent columns with NULL. This query will really perform well since here we can let the database optimizer find the best execution plan to combine rows from several tables. There is also no Joins involved so it has a better performance than the SQL queries generated by TPT where a Join is required between the base and subclasses tables.

    Choosing Strategy Guidelines

    Before we get into this discussion, I want to emphasize that there is no one single "best strategy fits all scenarios" exists. As you saw, each of the approaches have their own advantages and drawbacks. Here are some rules of thumb to identify the best strategy in a particular scenario:
    • If you don’t require polymorphic associations or queries, lean toward TPC—in other words, if you never or rarely query for BillingDetails and you have no class that has an association to BillingDetail base class. I recommend TPC (only) for the top level of your class hierarchy, where polymorphism isn’t usually required, and when modification of the base class in the future is unlikely.
    • If you do require polymorphic associations or queries, and subclasses declare relatively few properties (particularly if the main difference between subclasses is in their behavior), lean toward TPH. Your goal is to minimize the number of nullable columns and to convince yourself (and your DBA) that a denormalized schema won’t create problems in the long run.
    • If you do require polymorphic associations or queries, and subclasses declare many properties (subclasses differ mainly by the data they hold), lean toward TPT. Or, depending on the width and depth of your inheritance hierarchy and the possible cost of joins versus unions, use TPC.
    By default, choose TPH only for simple problems. For more complex cases (or when you’re overruled by a data modeler insisting on the importance of nullability constraints and normalization), you should consider the TPT strategy. But at that point, ask yourself whether it may not be better to remodel inheritance as delegation in the object model (delegation is a way of making composition as powerful for reuse as inheritance). Complex inheritance is often best avoided for all sorts of reasons unrelated to persistence or ORM. EF acts as a buffer between the domain and relational models, but that doesn’t mean you can ignore persistence concerns when designing your classes.

    Summary

    In this series, we focused on one of the main structural aspect of the object/relational paradigm mismatch which is inheritance and discussed how EF solve this problem as an ORM solution. We learned about the three well-known inheritance mapping strategies and their implementations in EF Code First. Hopefully it gives you a better insight about the mapping of inheritance hierarchies as well as choosing the best strategy for your particular scenario.

    Happy New Year and Happy Code-Firsting!

    References