Identity and your Domain Model

Tuesday, April 25, 2006

I’ve been struggling with a concept today that I wanted to flesh out. I may ramble on but I think there’s a point to be had deep down here (somewhere).

How often do you see a class begin its life like this:

    5     public class Customer
    6     {
    7         private int id;
    8 
    9         public Customer()
   10         {
   11         }
   12 
   13         public int Id
   14         {
   15             get { return id; }
   16             set { id = value; }
   17         }
   18     }

Looks innocent enough, but many times that Id value is there because

an object of this class has to eventually persist into a database and someone thought it would be easy to store it here
that database uses an identity column and thus, the value in your business entity has to be an integer to maintain a reference to it
someone wants to use it in a UI layer so they can retrieve details about the item (DisplayCustomer.aspx?Id=3) (or someone wants to show a “nice” number to a user)

An identity column (more of a SQL Server term, Oracle can pull it off but it’s a little more involved) is a column that provides a counter for you. In it's simplest form an identity column creates a numeric sequence for you.

More often than not though, it gets tied (directly or indirectly) to a class design. This is where the fun begins.

What happens when I want to test this class? When I want to write a test checking that two objects have a unique identity I might write some tests that look like this:

   21     [TestFixture]
   22     public class CustomerFixture
   23     {
   24         [Test]
   25         public void TwoCustomersAreUnique()
   26         {
   27             Customer firstCustomer = new Customer();
   28             Customer secondCustomer = new Customer();
   29             Assert.IsFalse(firstCustomer.Id == secondCustomer.Id);
   30         }
   31     }

With the above code, my test fails because I haven’t initialized Id to anything so they’re the same. However, in order to initialize them to something unique (each time) I need something to do this. Since Id was put there because someone knew this object was eventually going to be stored in a database it’s easy. Create the customer and when it’s saved (and loaded back into my object) a new Id is created. Voila. Test passes.

This is great but it means I’m inherently tied to my data source layer (in order to get an identity) to create my business entity. That’s no good for testing.

Maybe with a mock customer I can fix this, but again I would have to create some kind of mock system that generated id numbers on the fly. Not as easy as it sounds (especially when they have to be unique). In any case, it doesn’t model my business domain and at the end of the day, why do I need some number floating around that tells me what record # my object is in some database somewhere. That has nothing to do with the problem at hand.

I’m not saying an object couldn’t/shouldn’t/wouldn’t have identity, but a domain objects identity is not it’s ordinal position in a database.

Eric Evans makes a great statement about this:

“When an object is distinguished by its identity, rather than its attributes, make this primary to its definition in the model.”

I completely believe this and try to follow it as best as possible. Given an object (say a bank transaction) where each transaction has to be unique, identity is an important thing. However imagine if you tied bank transactions identities to an endless numbering system in SQL Server? How can I guarantee uniqueness when I have multiple data stores (say an active and passive one). Or a data warehouse. Or an internationally distributed system where I have to generate two unique transaction numbers on each side of the planet. What if someone resets/restarts the identity counter?

Okay, maybe I’m getting carried away here but eventually, IMHO, the identity approach falls short and you need something better.

Relying on infrastructure for your domain objects is a bit of a cheat and while even using something like a GUID isn’t perfect (and requires infrastructure as GUIDs are generated from things like hardware) it is pretty much guaranteed to be unique no matter what. Even creating one in Java and one in .NET, on the same machine, at the same time will get you a unique identifier (although I’m not sure if a dual-core system would never generate two GUIDs but I’ll leave that for the weary traveller to test out).

So if we change our Customer class to use GUIDs for identity we get something like this:

    6 public class Customer
    7 {
    8     private Guid id = Guid.NewGuid();
    9 
   10     public Customer()
   11     {
   12     }
   13 
   14     public Guid Id
   15     {
   16         get { return id; }
   17         set { id = value; }
   18     }
   19 }

Now the test we wrote before passes correctly because we have two unique identities for each object, no database required. Much better.

So all I’m saying is (to quote Jimmy Nisson) “Get rid of those nasty IDENTITY/sequences…” and “let the model set the values, for example by calling a simple service at the right place/time”.

Just something to consider when you’re building out your classes. Sure, what’s a system without storing it but it doesn’t mean you have to pollute your model with multiple numbers to keep track of something in a database system somewhere. Identity in a database is just that, and not something that you should rely on in your domain (especially if you’re doing TDD and don’t have one).

Try using GUIDs (or some other method if you prefer, like a service) that will help you keep your domain model pure of what it needs to operate with, and leave the non-business stuff like tracking numbers to the infrastructure layer.

Note: if you’re still hung up on using identity and SQL to generate ids for your business objects, check out Don Schlichtings article here on getting the right identity.

11 Comments

GUIDs are also important when you want to move your data from one domain to another. So when you want to baackload your old or aquiredd customer data into your new customer database... you'll be glad you were using GUIDs

Guy Murphy - Tuesday, April 25, 2006 2:31:00 AM

Are you suggesting not using IDs in the database or if you use them there not to read it from the database but instead create a new guid for it each time you read it ?

Ninni - Tuesday, April 25, 2006 5:57:00 AM

Then you are replacing one primitive type (int) with another (GUID, ok, not that primitive, but very infrastrucuture like). If identity is important, then you probably should encapsulate this fact and present it as a custom type. I believe Code Complete (2nd ed.) discussed this.

anonymous coward - Tuesday, April 25, 2006 7:14:00 AM

@Ninni: I'm just saying don't stuff ids into your domain objects just because you need to retreive something from a database (eventually). You can just as easily find a row in a database using a "SELECT * FROM Customers WHERE Guid = 'XXXX-XXXX-XXXX-XXXX'" type thing (putting your select into say a DAO where the method would be GetCustomerById(Guid).

Bil Simser - Tuesday, April 25, 2006 7:31:00 AM

@AC: True that you're swapping one type with another, but the point here is that your identity isn't tied to a database implementation. If I put the identity into my object and use something like a GUID, every single Domain Object created will always have a unique identity that I can use for filtering, finding, etc.

The custom type works as well and is a good approach but I would look at keeping things simple unless you need more than what say a GUID would offer.

Bil Simser - Tuesday, April 25, 2006 7:35:00 AM

Let's not forget, nulls exist for reason: no value assigned yet.

The "same" check is more complicated than == but that's fine by me.

Jim - Tuesday, April 25, 2006 7:57:00 AM

I think I disagree here, but am willing to be converted :)

1. if you've got a persistence framework in place (which you usually should), it shouldn't be a problem to drop in a mock implementation which simply assigns unique values to what are essentially output parameters when the unit of work persists the entities. Yes it will mean you have to wrap your test in something like:

using (IUnitOfWork uow = new MockUnitOfWork())

{

uow.Commit();

Assert here...

}

2. It is usually preferable to use some type of synthetic key on an entity. Yes, an attribute like 'Name' may be unique and look like a candidate key, but its often neccesary to allow something like Name to be modified... so its out as a primary key. I agree a guid is valid, but its not really that much of a problem to turn off identity increment when batch importing data.

3. Performance... of course the round trip cost to the db takes up the bulk of simple queries, but I would guess its faster searching for an int vs a guid or a string...

4. You do often want to show the user a nice Id for something in a url... surely its preferable to show an int instead of a guid. What happens when you're passing multiple parameters... urgh.

Aaron

Aaron Robson - Tuesday, April 25, 2006 9:25:00 AM

@Aaron: All good points. Not sure if I'm going to convert anyone here.

1. I agree it's a snap to use a mock implementation and a persistence framework like spring or nhibernate, but I tend to start out my domain design based on business entities and worry about silly things like UI and persistence later when I need it.

2. I forgot to mention that some kind of key like this should be immutable (so probably remove the setter and only create it in a constructor or automatically).

3. Not sure about performance as I haven't done tests of SELECT of an int vs a string. I would think it's about the same.

4. Again I don't want to be concerned with UI elements when I'm trying to solve a business problem. Nice urls are for weenies and hackers. I would rather stuff values (if I have to use them) into a session variable and retrieve it that way so my urls are clean and unhackable.

Bil Simser - Tuesday, April 25, 2006 9:39:00 AM

The performance between the 2 is going to be much different. You are talking about building an index on a column that is 16bytes wide versus 4 bytes wide. They will not perform "about the same"

I am not saying guids are bad, there are certainly some cases where they make sense over ints, but I think it is a fairly small percentage. You can still do some sort of sequence generation like Nilson suggests with the service call.

One thing we have played with at work is having a Sequence table which has a row for each table in the database and 1 column with an int value. That int value represents the last sequence that was given out for the particular table.

We still make PKs int, but we remove the identity stuff. Now if you need multiple machine synchronization, this becomes a bit more difficult. For true offline sequence generation, guids definitely have the upper hand.

Jeff Gonzalez - Tuesday, April 25, 2006 10:24:00 AM

@Jeff: Thanks for the info. I still don't think there's a huge performance hit between 16 bytes and 4 so I might run some numbers to test it out and convert. The seperate sequence table is good and you could implement it via a remote service like Jimmy mentioned so you could access it from anywhere. I think Oracle (beneath the covers) does identity this way anyways.

Bil Simser - Tuesday, April 25, 2006 11:36:00 AM

SQL Server side note: Store both an Int and a UniqueIdentifier in the db, and use the Int as the clustered index and the UniqueIdentifier as a non-clustered index.

Yves Reynhout - Tuesday, April 25, 2006 4:18:00 PM

Comments have been disabled for this content.