ID Best Practices for Databases

Dragontamer5788 picture Dragontamer5788 · Dec 3, 2010 · Viewed 12.7k times · Source

I was wondering what the best practices were for building and storing IDs. A few years ago, a professor told me about the dangers of a poorly constructed ID system, using the Social Security Number as an example. In particular, because SSNs do not have any error detection... it is impossible to tell the difference between a 9-digit string and a valid SSN. And now government agencies need things like Last Name + SSN or Birthday + SSN to keep track of your data and ensure its verification. Plus, your Social Security number is somewhat predictable based on where you were born.

Now I'm building a User database... and based off of this advice "userid mediumint auto_increment" would be unacceptable. Especially if I plan to use this ID as the primary identification for the user. (for example, if I allow the users to change their username, then the username would be more difficult to keep track than the numerical userid... requiring cascading foreign keys and whatnot.) Emails change, usernames can change, passwords change... but a userid should remain constant forever.

Clearly, auto_increment is only designed for surrogate_keys. That is, its a useful shortcut only when you already have a primary identification mechanism, but it shouldn't be used as an "innate identifier" for the data. Creating random UUID looks interesting, but the randomness turns me off.

And so I ask: whats the best practices for creating a "primary key" identification number?

Answer

Bill picture Bill · Dec 4, 2010

You are confusing internal database functionality with external search criteria.

Auto-increment surrogate keys are useful for internal application use. Never pass those on to the user. Identifying business objects, whether it is a user or an invoice, are done with unique information about the object, like SSN, CCN or DOB. Use as much info as necessary to uniquely identify the object.

I highly recommend that if you must supply some newly invented ID value to each customer, that it NOT be the field you link all the customer data tables on.