Is string or int preferred for foreign keys?

XMen picture XMen · Jan 27, 2011 · Viewed 19.2k times · Source

I have a user table with userid and username columns, and both are unique.

Between userid and username, which would be better to use as a foreign key and why?
My Boss wants to use string, is that ok?

Answer

StuartLC picture StuartLC · Jan 27, 2011

It looks like you have both a surrogate key (int userId) and a natural key (char or varchar username). Either column can be used as a Primary key for the table, and either way, you will still be able to enforce uniqueness of the other key.

There are many existing discussions on the trade-offs between Natural and Surrogate Keys - you will need to decide on what works for you, and what the 'standard' is within your organisation.

Here's some considerations when choosing one way or the other:

The case for using Surrogate Keys (e.g. UserId INT AUTO_INCREMENT)

If you use a surrogate, (e.g. UserId INT AUTO_INCREMENT) as the Primary Key, then all tables referencing table MyUsers should then use UserId as the Foreign Key.

You can still however enforce uniqueness of the username column through use of an additional unique index, e.g.:

CREATE TABLE `MyUsers` (
  `userId` int NOT NULL AUTO_INCREMENT,
  `username` varchar(100) NOT NULL,
  ... other columns
  PRIMARY KEY(`userId`),
  UNIQUE KEY UQ_UserName (`username`)

As per @Dagon, using a narrow primary key (like an int) has performance and storage benefits over using a wider (and variable length) value like varchar. This benefit also impacts further tables which reference MyUsers, as the foreign key to userid will narrower.

Another benefit of the surrogate integer key is that the username can be changed easily without affecting tables referencing MyUsers. If the username was used as a natural key, then tables were coupled to MyUsers via username, it makes it more inconvenient to change a username (since the Foreign Key relationship would otherwise be violated). If updating usernames was required on tables using username as the foreign key, a technique like ON UPDATE CASCADE would need to be employed to retain data integrity.

The case for using Natural Keys (i.e. username)

On the down side for using Surrogate Keys, other tables which reference MyUsers via a surrogate key will always require a join back to the MyUsers table to retrieve the username. One of the potential benefits of Natural keys is that if a query requires only the Username column from a table referencing MyUsers, that it need not join back to MyUsers to retrieve the user name, which will save some overhead.

Further references on the natural vs surrogate debate and trade-offs here and here