SharpDeveloper
Guid Or Int Primary Key?
When designing a database, and creating the tables and schema, we have to choose carefully what we want our primary key to be. There are many different aspects to this. One aspect is do we want to use a natural key, or a surrogate key? An example of a natural key would be like a SIN or SSN number to represent a person. An example of a surrogate key would be an autonumber column in a database that starts with the value 1 and increases. Both of them have different pros and cons. The natural key if not chosen wisely can have duplicates and cause problems, and may also require you to join on multiple columns. For example, if your last name is your primary key, you will definitely run into cases where you will need a second column, say First Name, to differentiate between different records. In this case, you may even need to use a third one in the case of two people having the same first and last name. Surrogate keys do not suffer from this problem, but are meaningless in themselves (i.e. Record 4151 is much less meaningful than say Employee S Alibhai)
If we have decided to go with a surrogate key, we have a couple of choices, at the least. This article will discuss the benefits of using a GUID datatype over an Integer for your primary key.
First of all, what is a GUID? “A Globally Unique Identifier or GUID is a special type of identifier used in software applications in order to provide a reference number which is unique in the context for which it is used, for example, in defining the internal reference for a type of access point in a software application, or for creating unique keys in a database. While each generated GUID is not guaranteed to be unique, the total number of unique keys is so large that the probability of the same number being generated twice is very small.” (Wikipedia) This idea is so awesome, you can just keep generating GUIDs and never worry about a conflict. It’s not like phone numbers that we run out of them so quickly (especially since cell phones became so popular) that we have to add new area codes, you can just use a GUID and not worry about it!
GUIDs ensure global uniqueness. Lets say you are one patent office out of 15,000 and you are issuing patents. If you issue a patent using Guids, you can be guaranteed that there is no other patent office that issued a patent with the same GUID as you!
GUIDs can be moved across databases nicely (no need to renumber existing keys). This means that even if you have a distributed database holding different sets of records from one table, you are garaunteed that when you merge them, you will not have any duplicate keys. For example, if you had one database for each sales office, and each sale had a unique key that was a GUID, you will never have two sales with the same GUID.
GUIDs reduce the number of joins required. If you have a hierarchal relationship in a database, say you have a county, that contains school, and a school contains a class, and a class contains students, here is two ways we can do this.
Select * from students where classID=5 and schoolID=3 and countyID=1
This is because each county can have a school numbered 1, and each school can have a class numbered 5.. So to ensure uniqueness, we have to add more to our where clause.
However, if we were doing this with GUID, we can just do the following instead:
Select * from students where classID=’D59B58D1-4A46-4ED1-888F-00450556DA9A’
In this case, there is no possible way (well, maybe one in a quintillion) that there are two classes with the ID ’D59B58D1-4A46-4ED1-888F-00450556DA9A’
However, GUID has a larger data size, and you might want to take that into consideration.
In conclusion, I feel that the benefits outweigh the costs. In one case, I have been able to pass a GUID and have much shorter queries rather than passing multiple sets of information that made my life much much easier. If I had a choice, I would go for GUID
Reference: Programming Microsoft ADO.NET 2.0 Applications Advanced Topics, Glenn Johnson (Free Chapter)
Related Reading:
Other Interesting Posts
5 Responses to Guid Or Int Primary Key?
Leave a Reply Cancel reply
-
Articles
- January 2011
- April 2010
- March 2010
- February 2010
- January 2010
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- February 2009
- December 2008
- November 2008
- October 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
-
Meta








Good article, the author thanks!
your section “GUIDs reduce the number of joins required.” is completely wrong in every way. This sentence: “Select * from students where classID=5 and schoolID=3 and countyID=1
This is because each county can have a school numbered 1, and each school can have a class numbered 5.. So to ensure uniqueness, we have to add more to our where clause. ” Is just WRONG … period. a PRIMARY KEY is unique in the whole database … no 2 classes have the same ID, so the real answer is: “Select * from students where classID=5″
The other 2 points are valid …. with “GUIDs can be moved across databases nicely (no need to renumber existing keys).” being the ONLY real world reason I’ve ever seen for actually using GUID keys (running multiple seperately run databases that sync up with each other often).
You are right, I can’t remember what I was thinking when I wrote this. I concocted that school example on the spot but I did have another real example which I cannot recall right now as it was almost 2 years ago
The school example _could_ be like this:
select * from students
where classGrade = 5
and schoolNumber = 3
and countyNumber = 1
In this case you don’t know the primary keys, but the county numbers and the school numbers within each county are known figures, and you want all 5th grade students on that particular school. It isn’t a very good query, but it’s possible to find data this way if you don’t have a list of GUIDs. I’ve used this principle often when writing ad hoc queries to find specific data.
I’ve also read that GUIDs are neither global nor unique. One developer experienced weekly collisions. This means you have to check for collisions anyway, you simply cannot rely 100% on the GUID uniqueness. For some applications a few collisions might be acceptable, while totally unacceptable to others.
Another issue is for MySQL, where the primary key is included in every index on the table. Using GUIDs will increase both data and index size, in some cases substantially. This ought to have an impact on performance. Of course, it might or might not be a problem for various applications. But as a rule of thumb, I’d go for an unsigned int as the primary key if I need a surrogate key. GUID seems more of a last resort or special case solution to me.
I do agree with all of the ideas you have presented in your post. They are really convincing and will certainly work. Still, the posts are too brief for beginners. May just you please extend them a bit from subsequent time? Thank you for the post.