Guids – Unique or not?

A GUID, (which may be pronounced goo-id or gwid), is a globally unique identifier.

The guid is a sequence of hexademical digits and looks something like this:

4F3104E0-3F89-12D3-7A0C-0405A82D3402

I’ve heard developers say that it is unique world-wide, yep, globally. But is it? I decided to do some research and find out based on other’s experience.

Wikipedia says “While each generated GUID is not guaranteed to be unique, the total number of unique keys (2128 or 3.4×1038) is so large that the probability of the same number being generated twice is very small.”

Hmm. Okay. But how about real-world experience? I am still skeptical. I find this sentence about GUIDs in an article by Scott Mitchell. A reader of Scott’s, Rusty Alderson, shares:

"In creating very large databases, GUIDs (as generated by MS's NEWID() function) WILL have collisions, or duplicates. In a previous project where I was building a large Customer ODS (Operational Data Store), I typically found about 4 collisions per million rows. Code has to be developed to handle this. This was in SQL Server 2000. Perhaps SQL Server 2005 has remedied this."

So there you have it. GUIDs are not globally unique. So now that we are in SQL Server 2008, what’s the verdict? Anyone working with millions of rows having collisions?

11 Comments

  • I'd argue in the case of the collisions that some database corruption might just as well be the case. Certainly with millions of records such is a possibility. I am a string believer of the GUID as a unique identifier.

    Regards,
    Meile

  • GUIDs are like religion... they're a matter of faith, not science. Nobody has successfully offered hard proof of their uniqueness by collecting all the GUIDs ever created in one place and comparing them to each other.

    Seriously, the SQL Server 2000 issue sounds like a bug.

  • As your database grows, does it get "slower" to test uniqueness of a GUID?

    If so, it might be hard to objectively test this theory.

    Seems like automatically incrementing is a bit easier to cope with (if you can use it.)

  • Since GUID is based on time (and the network card Mac address), it might (?) be a collision in time on a multi CPU server (pooled within the same tick).. On the SQL Server one can instead use NEWSEQUENTIALID(). And btw, in theory the GUID should be endlessly big to support _always_ being unique.

  • 5,316,911,983,139,663,491,615,228,241,121,400,000 possible combinations.

    The SQL Server issue DOES sound like a bug. I'd say my chances for getting the same number twice are about as good as me getting struck by lightning 1000 times, in the same spot, all while winning the lottery 1000 times.

  • Heck - if you are still scared, make your id based on two GUIDs :)

  • I agree the SQL 2000 issue sounds like a bug. I would be surprised if SQL 2005 had the same issue. Guid's are used too often by too many people for any major issues like this to be a worry.

  • They can be duplicated, but its not very probable if you do not try to.

  • i hate the fact that people don't understand probability.

  • You aint neva gunna get two the same, I guarantee it or your money back ;)

  • There simply is no way that you will have a collision in GUIDs.  The randomness is created through a combination of your computers MAC address, system time and random character generation.  I have used GUID's for years - never an issue.  It's tried and true approach.  The biggest consideration when using GUIDs is its effect on index/insert performance which is slower than using say an integer.  This is generally only an issue for large-scale transaction systems however.

Comments have been disabled for this content.