Login

I've worked on a number of database systems in the past where moving entries between databases would have been made a lot easier if all the database keys had been [GUID / UUID][1] values. I've considered going down this path a few times, but there's always a bit of uncertainty, especially around performance and un-read-out-over-the-phone-able URLs.

Has anyone worked extensively with GUIDs in a database? What advantages would I get by going that way, and what are the likely pitfalls?

[1]:

[To see links please register here]

GUIDs may cause you a lot of trouble in the future if they are used as "uniqifiers", letting duplicated data get into your tables. If you want to use GUIDs, please consider still maintaining UNIQUE-constraints on other column(s).

Why doesn't anyone mention performance? When you have multiple joins, all based on these nasty GUIDs the performance will go through the floor, been there :(

@Matt Sheppard:

Say you have a table of customers. Surely you don't want a customer to exist in the table more than once, or lots of confusion will happen throughout your sales and logistics departments (especially if the multiple rows about the customer contain different information).

So you have a customer identifier which uniquely identifies the customer and you make sure that the identifier is known by the customer (in invoices), so that the customer and the customer service people have a common reference in case they need to communicate. To guarantee no duplicated customer records, you add a uniqueness-constraint to the table, either through a primary key on the customer identifier or via a NOT NULL + UNIQUE constraint on the customer identifier column.

Next, for some reason (which I can't think of), you are asked to add a GUID column to the customer table and make that the primary key. If the customer identifier column is now left without a uniqueness-guarantee, you are asking for future trouble throughout the organization because the GUIDs will always be unique.

Some "architect" might tell you that "oh, but we handle the _real_ customer uniqueness constraint in our app tier!". Right. Fashion regarding that general purpose programming languages and (especially) middle tier frameworks changes all the time, and will generally never out-live your database. And there is a very good chance that you will at some point need to access the database without going through the present application. == Trouble. (But fortunately, you and the "architect" are long gone, so you will not be there to clean up the mess.) In other words: Do maintain obvious constraints in the database (and in other tiers, as well, if you have the time).

In other words: There may be good reasons to add GUID columns to tables, but please don't fall for the temptation to make that lower your ambitions for consistency within the _real_ (==non-GUID) information.

One other small issue to consider with using GUIDS as primary keys if you are also using that column as a clustered index (a relatively common practice). You are going to take a hit on insert because of the nature of a guid not begin sequential in anyway, thus their will be page splits, etc when you insert. Just something to consider if the system is going to have high IO...

The main advantages are that you can create unique id's without connecting to the database. And id's are globally unique so you can easilly combine data from different databases. These seem like small advantages but have saved me a lot of work in the past.

The main disadvantages are a bit more storage needed (not a problem on modern systems) and the id's are not really human readable. This can be a problem when debugging.

There are some performance problems like index fragmentation. But those are easilly solvable (comb guids by jimmy nillson:

[To see links please register here]

)

*Edit* merged my two answers to this question

@Matt Sheppard I think he means that you can duplicate rows with different GUIDs as primary keys. This is an issue with any kind of surrogate key, not just GUIDs. And like he said it is easilly solved by adding meaningfull unique constraints to non-key columns. The alternative is to use a natural key and those have real problems..

There is one thing that is not really addressed, namely using **random** (UUIDv4) IDs as primary keys will harm the performance of the *primary key index*. It will happen whether or not your table is clustered around the key.

RDBMs usually ensure the uniqueness of the primary keys, and ensure the lookups by a key, in a structure called BTree, which is a search tree with a large branching factor (a binary search tree has branching factor of 2). Now, a sequential integer ID would cause the inserts to occur just *one* side of the tree, leaving most of the leaf nodes untouched. Adding random UUIDs will cause the insertions to split leaf nodes all over the index.

Likewise if the data stored is mostly temporal, it is often the case that the most recent data needs to be accessed and joined against the most. With random UUIDs the patterns will not benefit from this, and will hit more index rows, thereby needing more of the index pages in memory. With sequential IDs if the most-recent data is needed the most, the hot index pages would require less RAM.

Advantages:

- UUID values are unique between tables and databases. Thats why it can be merge rows between two databases or distributed databases.
- UUID is more safer to pass through url than integer type data.
If one pass UUID through url, attackers can't guess the next id.But if we pass Integer type such as 10, then attackers can guess the next id is 11 then 12 etc.
- UUID can generate offline.

[primary-keys-ids-versus-guids](

[To see links please register here]

)

[The Cost of GUIDs as Primary Keys][1] (SQL Server 2000)

[Myths, GUID vs. Autoincrement][2] (MySQL 5)

This is realy what you want.

__UUID Pros__

* Unique across every table, every database, every server
* Allows easy merging of records from different databases
* Allows easy distribution of databases across multiple servers
* You can generate IDs anywhere, instead of having to roundtrip to the database
* Most replication scenarios require GUID columns anyway

__GUID Cons__

* It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
* Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
* The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes

[1]:

[To see links please register here]

[2]:

[To see links please register here]

One thing not mentioned so far: UUIDs make it much harder to profile data

For web apps at least, it's common to access a resource with the id in the url, like `stackoverflow.com/questions/45399`. If the id is an integer, this both
- provides information about the number of questions (ie September 5th, 2008, the 45,399th question was asked)
- provides a leverage point to iterate through questions (what happens when I increment that by 1? I open the next asked question)

From the first point, I can combine the timestamp from the question and the number to profile how frequently questions are asked and how that changes over time. this matters less on a site like Stack Overflow, with publicly available information, but, depending on context, this may expose sensitive information.

For example, I am a company that offers customers a permissions gated portal. the address is `portal.com/profile/{customerId}`. If the id is an integer, you could profile the number of customers regardless of being able to see their information by querying for `lastKnownCustomerCount + 1` regularly, and checking if the result is `404 - NotFound` (customer does not exist) or `403 - Forbidden` (customer does exist, but you do not have access to view).

UUIDs non-sequential nature mitigate these issues. This isn't a garunted to prevent profiling, but it's a start.

Drimpassion0

initiation623

aldolase809712

cothurned792600

nifle587269

arletaaozuoud

nonlethal939613

danseuse235

Proapogee408

opalinapwdu