Table Inheritance with EF Core

Introduction

EF Core 7 finally delivered a long-sought desire: all of the three table inheritance patterns are now implemented. These are:

  • Table Per Hierarchy (TPH) / Single Table Inheritance: all columns for the properties of base and all derived classes are stored on the same, single, table, one for each base class (implemented in EF Core 1);
  • Table Per Type (TPT) / Class Table Inheritance: columns for the properties of each class (abstract or otherwise) are stored on an individual table, and linked to tables that contain the columns for the properties for each base class through foreign keys (EF Core 5);
  • Table Per Concrete Type (TPC) / Concrete Type Inheritance: a table exists for each concrete class which contains columns for each property of that class (EF Core 7).

All of these patterns were discussed at length on Martin Fowler’s book, Patterns of Enterprise Application Architecture, one which I strongly recommend.

Why all of this, I hear you say? Well, there may be lots of reasons. As you know, class models and database models don’t exactly match (the object-relational impedance mismatch), so we need to figure out the best way for us to store our class model to suit our needs. And they can be various:

  • We want things to be normalised;
  • We want to minimise the number of JOINs when performing queries;
  • We want to exactly match one table per concrete class;
  • Etc, etc.

Now, I won’t go into all of the details, just explain the pros and cons of each approach and how to get them working quickly. Let’s see some of the issues with each of these patterns, with a simple domain model to guide us:

Domain Model

What we have is:

  • A base class, Animal, which is abstract;
  • Another abstract class derived from Animal: Pet;
  • A concrete class derived from Animal: Snake;
  • Two concrete classes that inherit from Pet: Dog and Cat.

Here is the code, do not pay too much attention to its properties, it’s just meant to be an example:

public abstract class Animal
{
     public int Id { get; set; }
     public string Name { get; set; }
     public DateTime Birthday { get; set; }
}
public abstract class Pet : Animal
{
     public string Owner { get; set; }
}
public class Dog : Pet
{
     public bool LikesCats {  get; set; }
}
public class Cat : Pet
{
     public bool Purrs { get; set; }
}
public class Snake : Animal
{
     public int Length { get; set; }
}

So, let’s see how we can persist these classes to a relational database in the three different possible ways, starting with the Table Per Hierarchy.

Table Per Hierarchy

This is the default pattern, or strategy, and the first to be implemented in EF Core. With this pattern, all data – columns to hold the properties’ values - for all classes along the same inheritance root are stored on the same table. Here is the configuration code for the DbContext class, notice the call to UseTphMappingStrategy on the root class of the hierarchy:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
     modelBuilder.Entity<Animal>().UseTphMappingStrategy();
     modelBuilder.Entity<Pet>();
     modelBuilder.Entity<Cat>();
     modelBuilder.Entity<Dog>();
     modelBuilder.Entity<Snake>();
     base.OnModelCreating(modelBuilder);
}

Nothing much to see here, we just mark the root class of the inheritance as using the TPH inheritance strategy as well as identifying all the classes that are to be known by EF Core and we’re good.

If we were to let EF Core generate the tables for us, as we should, this is what we’d get, a single table called Animal:

Animal

Using TPH forces no restriction whatsoever, so, for example, you can use whatever id generation strategy you want (more on this later).

So, say we want to query from the Animal base class, we could do:

var animals = ctx.Set<Animal>().ToList();

This would produce the following SQL, notice all the columns for all properties of all possible classes are returned:

SELECT [a].[Id], [a].[Birthday], [a].[Discriminator], [a].[Name], [a].[Owner], [a].[Purrs], [a].[LikesCats], [a].[Length]
FROM [Animal] AS [a]

This, of course, only makes sense should we want to list all the animals in the database, which probably doesn’t happen that often. But we may want to be able to find what a specific animal is, from its id:

var animal = ctx.Set<Animal>().Find(1);

We would then get this:

SELECT [a].[Id], [a].[Birthday], [a].[Discriminator], [a].[Name], [a].[Owner], [a].[Purrs], [a].[LikesCats], [a].[Length]
FROM [Animal] AS [a]
WHERE [a].[Id] = @__p_0

Now we need to figure out what is the type of the animal returned (Cat, Dog or Snake), or null, if no Animal could be found with that id.

What if we, instead of going for the whole Animal hierarchy, we just go for Pet, like this:

var pets = ctx.Set<Pet>().ToList();

Then the resulting SQL would be this, notice no column for the Snake class:

SELECT [a].[Id], [a].[Birthday], [a].[Discriminator], [a].[Name], [a].[Owner], [a].[Purrs], [a].[LikesCats]
FROM [Animal] AS [a]
WHERE [a].[Discriminator] IN (N'Cat', N'Dog')

Finally, if we were to query for a specific, concrete class, like Cat:

var cats = ctx.Set<Cat>().ToList();

This is what we’d get, again, notice only the columns that are relevant for the Cat class are retrieved:

SELECT [a].[Id], [a].[Birthday], [a].[Discriminator], [a].[Name], [a].[Owner], [a].[Purrs]
FROM [Animal] AS [a]
WHERE [a].[Discriminator] = N'Cat'

OK, let us now proceed to Table Per Type!

Table Per Type

The TPT strategy was introduced on EF Core 5, so was the second to come. With it, each class, either concrete or abstract, in an inheritance root, gets it own table for storing the properties it defines. If no properties are defined in a class, then no table is generated. Tables are linked together by means of foreign keys: the root class maps to a table that contains the primary key and possibly other columns for the properties of the root class, and the other tables have foreign keys that link to this primary key; the foreign keys and the primary keys are, of course, the same, but can have different column names on the different tables.

First, the configuration code, notice just a small method change - UseTptMappingStrategy:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
     modelBuilder.Entity<Animal>().UseTptMappingStrategy();
     modelBuilder.Entity<Pet>();
     modelBuilder.Entity<Cat>();
     modelBuilder.Entity<Dog>();
     modelBuilder.Entity<Snake>();

     base.OnModelCreating(modelBuilder);
}

When we let EF Core generate the model for us, we get:

TPT - Tables

Notice that each class gets its own table and that they are all linked to the base Animal table, although Dogs and Cats are linked through the Pets table.

Querying for Animal, as we did before, we get:

SELECT [a].[Id], [a].[Birthday], [a].[Name], [p].[Owner], [s].[Length], [c].[Purrs], [d].[LikesCats], CASE
     WHEN [d].[Id] IS NOT NULL THEN N'Dog'
     WHEN [c].[Id] IS NOT NULL THEN N'Cat'
     WHEN [s].[Id] IS NOT NULL THEN N'Snake'
     END AS [Discriminator]
     FROM [Animal] AS [a]
     LEFT JOIN [Pet] AS [p] ON [a].[Id] = [p].[Id]
     LEFT JOIN [Snake] AS [s] ON [a].[Id] = [s].[Id]
     LEFT JOIN [Cats] AS [c] ON [a].[Id] = [c].[Id]
     LEFT JOIN [Dogs] AS [d] ON [a].[Id] = [d].[Id]

Do notice the much higher complexity, which is expected, as we now have a lot of tables to query, and we have to fetch from all of them.

Restricting our query to just Pet, we get:

SELECT [a].[Id], [a].[Birthday], [a].[Name], [p].[Owner], [c].[Purrs], [d].[LikesCats], CASE
     WHEN [d].[Id] IS NOT NULL THEN N'Dog'
     WHEN [c].[Id] IS NOT NULL THEN N'Cat'
     END AS [Discriminator]
FROM [Animal] AS [a]
INNER JOIN [Pet] AS [p] ON [a].[Id] = [p].[Id]
LEFT JOIN [Cats] AS [c] ON [a].[Id] = [c].[Id]
LEFT JOIN [Dogs] AS [d] ON [a].[Id] = [d].[Id]

Notice the absence of the Snake table as it was not requested – not part of the Pet hierarchy.

Now, for the Cat class:

SELECT [a].[Id], [a].[Birthday], [a].[Name], [p].[Owner], [c].[Purrs]
FROM [Animal] AS [a]
INNER JOIN [Pet] AS [p] ON [a].[Id] = [p].[Id]
INNER JOIN [Cats] AS [c] ON [a].[Id] = [c].[Id]

Here there is no need to JOIN additional tables, just the ones that contain the information that is pertinent to the Cat class.

Let’s end with the Table Per Concrete Type pattern.

Table Per Concrete Type

This pattern was the last to be implemented, only in EF Core 7, and it has a few gotchas to it. First, let’s see how we can configure its usage:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
     modelBuilder.Entity<Animal>().UseTpcMappingStrategy();
     modelBuilder.Entity<Pet>();
     modelBuilder.Entity<Cat>();
     modelBuilder.Entity<Dog>();
     modelBuilder.Entity<Snake>();

     base.OnModelCreating(modelBuilder);
}

We just need to switch to using the UseTpcMappingStrategy and that’s almost all!

One thing we know about it is that each of the concrete classes gets its own table with all of the columns for all of the properties in the class’ hierarchy. There are no tables for abstract classes, which means there are no foreign keys too and no table for the base class.

TPC - Tables

But what would happen if we were to retrieve a base class, like Animal, from its id, and there were two or more tables, say, Cats, Dogs, or Snake, that had records with the same id? You guessed it, something would break very badly!

What this means is that, unfortunately, we cannot use IDENTITY/AUTO_INCREMENT/SERIAL/etc as the identifier generation pattern, because it won’t guarantee us that no two records will exist with the same id. We need to turn to something different, and fortunately EF Core offers a few:

  • Manually assigned (you don’t want to go there!);
  • GUID;
  • Hi-Lo;
  • Sequences.

The Hi-Lo pattern, in EF Core, is actually implemented by using sequences and is the recommended way to go, so I’ll be using it in this example, as I do in real life.

We need to modify slightly our code:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
     modelBuilder.HasSequence<int>("AnimalNumber");
     modelBuilder
         .Entity<Animal>()
         .Property(x => x.Id)
         .UseHiLo("AnimalNumber");

     modelBuilder.Entity<Animal>().UseTpcMappingStrategy();
     modelBuilder.Entity<Pet>();
     modelBuilder.Entity<Cat>();
     modelBuilder.Entity<Dog>();
     modelBuilder.Entity<Snake>();

     base.OnModelCreating(modelBuilder);
}

What I did here was, I added a declaration for a sequece (AnimalNumber) which is to be used by the Id property of the Animal class, essentially meaning, all of the concrete classes and their related tables. EF Core will generate it for us together with all the tables.

Now, getting all occurrences of Animal yields this SQL:

SELECT [s].[Id], [s].[Birthday], [s].[Name], NULL AS [Owner], [s].[Length], NULL AS [Purrs], NULL AS [LikesCats], N'Snake' AS [Discriminator]
FROM [Snake] AS [s]
UNION ALL
SELECT [c].[Id], [c].[Birthday], [c].[Name], [c].[Owner], NULL AS [Length], [c].[Purrs], NULL AS [LikesCats], N'Cat' AS [Discriminator]
FROM [Cats] AS [c]
UNION ALL
SELECT [d].[Id], [d].[Birthday], [d].[Name], [d].[Owner], NULL AS [Length], NULL AS [Purrs], [d].[LikesCats], N'Dog' AS [Discriminator]
FROM [Dogs] AS [d]

We are now using UNIONs and not JOINs, unsurprisingly, which results in much better performance.

Now, querying on Pet:

SELECT [c].[Id], [c].[Birthday], [c].[Name], [c].[Owner], [c].[Purrs], NULL AS [LikesCats], N'Cat' AS [Discriminator]
FROM [Cats] AS [c]
UNION ALL
SELECT [d].[Id], [d].[Birthday], [d].[Name], [d].[Owner], NULL AS [Purrs], [d].[LikesCats], N'Dog' AS [Discriminator]
FROM [Dogs] AS [d]

And finally, just Cat:

SELECT [c].[Id], [c].[Birthday], [c].[Name], [c].[Owner], [c].[Purrs]
FROM [Cats] AS [c]

A few closing remarks are in order.

Conclusion

So, where does this leave us? What is the best strategy for mapping an inheritance? I don’t think there is an obvious answer, but I can leave a few hints.

TPH

Pros:

  • Because everything is stored on the same table, querying on a base class is very fast;
  • Performance for INSERTs, UPDATEs and DELETEs is also very good.

Cons:

  • Lots of NULL values, because the table has columns for all the classes in the hierarchy, and only some of them have values;
  • If you need to look at the database directly, the table can have lots of columns, which may make it hard to understand.

TPT

Pros:

  • Very normalised model, uses less storage than TPH and TPC.

Cons:

  • Querying on a base class is relatively slow because of the number of JOINs. Actually, it always uses JOINs, so it always has some performance penalty;
  • INSERTs, UPDATEs and DELETEs are also slow because they affect many tables.

TPC

Pros:

  • Querying on a base class is relatively fast, although not as much as with TPH;
  • Performance for INSERTs, UPDATEs and DELETEs is very good.

Cons:

  • Requires the usage of a specific identifier generation strategy, such as Hi-Lo.

This being said, some general guidance:

  • Stick to TPC if you're mostly querying a concrete type;
  • TPH is also OK, especially if you need to query on some base types;
  • Only go with TPT if you absolutely need to, e.g., you have other external requirements.

                             

No Comments

Add a Comment

As it will appear on the website

Not displayed

Your website