November 2007 - Posts

Gospel of the GUID - Answers to your Burning Questions, Comments and Insults.

First and foremost, I was blown away with the number of nice responses and questions to my Gospel of the GUID post. For the most part, I got a lot of “wow, I never thought of that” and “cool, we do that too, glad to see someone else doing it” type of responses.  I enjoyed the slew of “I am a convert now” type of responses also. It was nice to see people play along with the motif of the post.  I had expected a few comments, but not the flood I received. Instead of just “approving” the monster load of comments, I wanted to condense it all down into a single readable response post. So many of the questions were flavors of the same question that I would hate for people to read 100+ comments where most of it was just duplication.

Before I go any further, I should state, and should have stated in my Gospel of the GUID post, that every rule in my shop can have exceptions. My GUID rule is no exception to this. (Sorry… couldn’t resist the pun) However, to break my rules, you must have valid reason for doing so. Many people have pointed out the more common exceptions; data warehousing, apps that need high speed multi-million inserts, etc.  These would all be allowable exceptions to me, however, these are not the majority of my applications and they are certainly not the norm of most business applications. After all, someone has to actually build the apps that generate the data for the warehouseJ!   

Another interesting thing that I did not expect, were the insults. I certainly didn’t expect the anti-Microsoft slurs but I guess if you are unemployed, sitting at home in your whitey-tighties, taking a break from leveling your level 70 Night Elf, what else are you going to do? Luckily after many years of playing MMORPGs, I am quite used to bravado that seems to be induced on the weak minded and immature by the anonymity of the net.  I just didn’t expect so much of it from the readers of ASP.NET … but what do I know? J Truth be known, it was the most entertaining part of it all for me.  For those of you that took the time to go out of your way to insult me (and Bill Gates), you will sleep better knowing that I, my team, and my friends had a good time laughing at you… not with you… but at you.

For the most part, I received a lot of emails and many comments asking “what about x?” I will answer the most common questions and comments below. I am so sorry it has taken so long to get these answers back to you.

1)      Can you explain number one a little better? I still don’t quite get it. Why don’t you just return the unique identifier from a stored proc?
The beauty of the GUID approach is that I can generate the identifier that I need in my object model WITHOUT going to the database.  This allows me to create an “Order” object, and all of its “Line Item” objects without making any calls to the database until I am ready to persist them, thereby saving trips to the database. I don’t care how small you can make your index, no trips to the database is faster than 1.
Note: If I need to generate an identifier at the database level, I can do that also. If I need to “return” it from a stored proc, I would need to use an output parameter.

2)      Don’t you know that using GUID’s as clustered indexes is bad?
At the risk of sounding flippant, yes, I know in certain cases it is bad.

Yes, I know that if you have a GUID as a clustered index AND you are doing high volume inserts, this really slows down your inserts. For 99% of the applications I write, I truly don’t care about how long the inserts take as long as they perform adequately.  

Yes, I know that over the long haul, the more records that get inserted the more fragmentation I could possibly have. For the most part I deal with this by using the appropriate Fill Factor or through database maintenance.  If it is really an issue after that, I don’t use a clustered index on the GUID here either.  MOST of my tables aren’t million+ rows, and most don’t need to be able to handle high volume inserts, therefore MOST of my tables do have the GUID as the clustered index.

3)      Why not use the new NewSequentialId()?
There are
several reasons really. The only one that really matters is that I can’t do it in my object model, and therefore it is of no use to me.

4)      What happens if we run out of GUIDs?  What if I don’t trust that they can’t be duplicated?
Simple, your “unique index” will guarantee you don’t get that duplicated bugger into your database. Unless, of course, you don’t trust SQL to make sure it doesn’t allow duplicates

Now you have a huge problem though, you need to rush out and buy a lottery ticket but be careful, because you are going to get hit by lightning on the way because you have defied the odds on a scale that is nearly unimaginable. After all, the odds of duplicating a GUID is 1 in 2122 (or 5.3×1036) Good luck with that. By the way, anyone know how long that would be if I could line up 5.3×1036 dollar bills one after the other?

5)      What about referential integrity regarding #3?
I knowing sacrifice referential integrity for the flexibility I get with my object model. I rarely physically delete data anyway. Most of the time, we do logical deletes so referential integrity doesn’t even come into play.

6)      What if you need to know which parent a row belongs to in #3?
If you absolutely must know at the row level of the child record, then you will either need to add a “type” identifier column, or you will need to do a left join against your appropriate parent tables to see who it belongs to.  It is extremely rare for me to need to go “up the chain” in my object graph from an object that is stored this way. I almost always only go “down the chain.” For example, I would start with my customer and get the notes for that customer. Not start with some note and try to figure out, “Gee what the heck is this note for?”

7)      What about Customers who don’t want to see a GUID for an Order Number?
GUIDs are NEVER for human consumption. (I don’t consider devs and dba’s human…they are superhumans
)  People like to see things like “Order ID 1000” which is totally fine.  Obviously you should make sure those are unique also, but that has nothing to do with using GUIDs.

Besides, this breaks another one of my rules, which is never make something that normal people feel is unique, your identifier. Some day humans will change this little piece of data, and when they do, your app won’t work.  I can’t tell you how many times I have seen an order “number” go from numeric to alpha numeric, or an identifier that will “always” be 5 characters, be changed to 6 characters years later. Here me now, and believe me later, don’t let your PKs be anything that means something to your customer.  Your PK should only be something that means something to your database… and that something should be what row am I getting… even if it is an INT

Additional Thoughts:

1)      In my world, which admittedly is different from many of yours, what I care about the most is:  

a.       How my app behaves from the user’s perspective.  I truly don’t care if a DBA thinks that speedy inserts of millions of records is important. DBA’s don’t dictate design in my apps. DBA’s don’t “own” the database in my world. They are an important part of the design team of our app, but they don’t get to make unilateral design decisions based on whether we can do millions of inserts quickly, if that’s not what my users do. If my users are ACTUALLY doing millions of inserts, I care very much and certainly wouldn’t do anything that caused that task to perform poorly, including making a clustered index on a GUID.

b.       Code maintainability. This is a topic for a future post of mine, but the short version is that I will take code that performs acceptably and is easier to maintain over hyper optimized code that is harder to maintain. The Gospel of the GUID is based on this philosophy, and makes my application framework very easy to maintain. When you know that EVERY row is identifiable by a GUID, your framework can easily deal with any row of data, without having to “figure it out.” For example, in our UI framework, we have an abstract base class for all of our pages that deal with singletons to inherit. In the base page, we have helper functions that do things like get the singleton etc. While not a huge amount of code in and of itself, across several hundred pages, it’s quite a nice savings.

2)      COMBS -  Now here is something that I must take a closer look at. Trading some of the uniqueness for better performing inserts merits some investigation on my part.

Off the top of my head, it would be pretty easy to modify my framework to use a COMB instead of GUID and of course it would be trivial to create a UDF in SQL for the same thing if I needed to generate some IDs at the DB level.  I will have to do some testing and see if this is worth doing. Thanks for the heads up on this dani.

Posted by wwright | 5 comment(s)

I have been bombarded with questions, comments and a significant amount of flat out insults. I am currently in meetings at work that have made all my days 15 hour work days so I have not had a chance to go through the moderation of the comments, but I will soon. I will attempt to answer all the questions and address some of the comments.

Posted by wwright | with no comments
The Gospel of the GUID

There is only one commandment in the Gospel of the GUID:

I. Thou shalt always use a GUID to uniquely identify a row in a table.

Whenever I hire a new member for my team, this is one of the first rules that I teach them. At first, they almost always look at me like a puppy with their ears up and their head tilted to the side as if to say "huh?"


So I show them that every single table has field that is a unique identifier, and most of the time, that field is the primary key and usually the clustered index at that. For example, if we have an Employee table, it has a field named EmployeeGUID that is a unique identifier, and it is the primary clustered key. It's at this point that I usually start get the "uh oh, my boss is an idiot but I don't want him to know I think that look." Next the questions and the "but I do x" comments start to come:

  1. I just use an int.
  2. I don't do that because they are so big.
  3. Don't you know how hard it is to go look up something by that, an int is so much easier.
  4. Surely you don't do that for many-to-many bridge table?
  5. What happens if they run out of GUIDs?
  6. I don't think I trust that they won't be duplicated.
  7. I have never heard of this, therefore it is probably a bad idea.
  8. Don't you think there is a performance penalty for doing that?

So then begins my quest to make them a convert. I have a high conversion rate among "thinking" people, and therefore I challenge you to read this and see if you too will become a believer!

Reason 1: I can make less trips to the database, now THAT is a performance enhancement!

There are many reasons to use a GUID as the primary key for your table. My primary reason is directly related to how I build my object models. I prefer to create a "new" instance of my object without round tripping the database. So if I am going to create a new Order object, I don't have to round trip the database to get the OrderID (OrderGUID in my world) like I would if I was using an int as the "ID" of the order. While not all that impressive at this level, consider this. I can create a new Order object with an OrderGUID, and ten OrderLineItem objects with their own OrderLineItemGUID without EVER round tripping the database unlike the INT solution which needs to make 11 calls to the DB.

Reason 2: Data Merging is so easy, Mac using developers can do it!

My second reason for always using a GUID is data merging. I can't count how many times that I have had to merge data. Back before I saw the light, I too would use INTs or some other thing to make my row unique but when I had to merge data, I always had to write a conversion.

Consider the following:

Customer Database 1:
Order Table

Customer Database 2:
Order Table

OrderID: 1

OrderID: 1

CustomerID: 1

CustomerID: 1

If customer 1 acquires customer 2 and I need to merge their data, I have to convert someone's OrderID 1 and CustomerID 1 to some INT that is not used, and update a lot of rows of data, and possibly monkey around with seed values. Multiply that out by dozens of tables, millions of rows of data, and I have a REALLY big task ahead of me, that will require a lot of testing after I write the SQL and/or code.

However, if I had followed the Gospel of the GUID

Customer Database 1: Order Table

Customer Database 2: Order Table

OrderID: {C290A815-BAAA-4455-82EA-F18F5D0CEF2E}

OrderID: {79E9F560-FD70-4807-BEED-50A87AA911B1}

CustomerID: {52335ABA-2D8B-4892-A8B7-86B817AAC607}

CustomerID: {60FA3045-BAE8-4526-87A2-DF22ED5F093B}


Now all I have to do is insert all the rows from one customer's db into the other's db. No conversion, no testing of the conversion necessary, just nice clean and efficient.  I recently needed to merge two of my client's data in just such a manner: AT&T and Cingular.  My "conversion" took all of 45 minutes.

Another similar example: imagine if you had a need for people to work disconnected and you need to merge in their changes once they get back online. Now it is child's play... if you are a believer in the Gospel of the GUID. You can easily push changes made in either DB to the other.

Reason 3: Type/Table Ignorance

The third reason I believe in the Gospel of the GUID is what I call "Type Ignorance" which is a play on "Type Inference" in .NET 3.5 as it is kind of the opposite meaning: I don't care what the type is at the row level in my table. Let's assume that I have an application that has been in production for a while. It has the following tables (among others) Customer, Order, OrderLineItem, Product, and finally Vendor. Now I determine that I want to be able to add "notes" to all of these items. I can simply create a new table and object that looks like:




Unique Identifier


Unique Identifier




I can insert all my notes into a single table, using the GUID of the parent object in the ParentGUID field. If I want to get to any object's notes all I need is a simple join. For example, to get to a Vendor's notes, all I need is a simple join of Note.ParentGUID to Vendor.VendorGUID. I don't need a "type" indicator so I know which table to join, nor do I need a myriad of join tables to know which note goes with which type.

You would be surprised how often this little technique comes in handy. We recently added "auditing" to our application where we needed to know exactly who added, updated, and deleted what. We simply added a handful of lines of code to the SubmitChanges() method of the datacontext (We use LINQ exclusively in  our new app)  and write out the appropriate record to our audit table. Any time we add a new object type/table to our app, it will automatically get the auditing ability without us having to change a single line of code, or make any modifications to our "auditing table."


Reason 4: The hits just keep on coming!

There are many less than obvious reasons for using GUIDs to identify things, but here is one that I did not plan for and was incredibly grateful for and ultimately it saved my client millions of dollars... yes I did say MILLIONS!

I was developing an AP Feed ability for a large client. They were going to be able to pay millions of dollars of their bills at the push of a button. Basically, push a button in our system, we would generate a feed file, ship it to their AP system, and it would generate the checks... and out the money would go. Of course, I used GUIDs to identify every single thing. So when the AP system generated a reconciliation file, I could easily match it up to my end.

At our site, this client had a production database and a test system which was a copy of their production system, just somewhat stale. A couple of months stale to be exact. During our testing, someone on their end saw one of our test feed files and mistakenly assumed that it was a production feed which should be "paid" so they dropped it over into their production AP system. You guessed it... they just paid a lot of legit vendors twice (once from the production system, and once from the test system) and some not so legit vendors got checks they shouldn't have because they were no longer the landlord (remember, the test db was a couple of months stale.)  Through no fault of my own, I now had a horrible mess of data... or at least my client thought so. However, since all of my payment records had GUIDs for identifiers, I could easily join the two db's against AP System which had my GUID for each payment, to determine which payments were legit and therefore should be paid, and which should get a good ol "stop payment."  Imagine if I had used INTs. I would have no way of knowing if PaymentID 1000 was from the good data or the test data.

So how did this save my client millions? Easy... thousands of payments multiplied by "stop payment" fees adds up in a hurry. Especially when they make this horrible mistake 3 times! J


So are there any downsides to the Gospel of the GUID?

In short, yes there are, however, they are so miniscule that they don't even come close to changing my mind. The most "in your face" downside is when you have to hand write SQL to find something.


is a lot easier than typing

SELECT * FROM ORDER WHERE ORDERGUID = ‘{45F57B42-38A4-46ce-A180-6DE0E7051178}'

A second downside is that there is a small performance hit for joins that are based on GUIDs if measured against the same data using INTs instead. In my experience, even when using multimillion row tables, this has NEVER been an issue. A few milliseconds are such a minor price to pay for all of the goodness.

I encourage you to try out this technique in your next small project, especially if you are still a skeptic. I think you will find it more useful than you ever dreamed.

More Posts