Cassandra: Generate a unique ID?

user2090879 picture user2090879 · Apr 18, 2013 · Viewed 36k times · Source

I'm working on a distributed data base. I'm trying to generate a unique ID that will serve as a column family primary key in .

I read some articles about doing this with Java using UUID but it seems like there is a probability for collision (even if it's very low).

I wonder if there is a way to generate a unique ID based on time maybe?

Answer

Richard picture Richard · Apr 18, 2013

You can use the TimeUUID type in Cassandra, which backs a Type 1 UUID. This uses the current time and the creator's MAC address and a sequence number. If the TimeUUID number is generated correctly this can be done with zero collisions (you can use the CQL now() method or insert your own, the java SDK's provide some thread-safe implementations). The main advantage of TimeUUIDs is that the IDs can be time ordered. See http://wiki.apache.org/cassandra/TimeBaseUUIDNotes for more info.

However, the time ordering is unlikely to be useful for row primary keys, since the ordering is useless when using a hash partitioner, though possible using a clustering key. And also the complexity of generating a unique ID could be a source of bugs if you roll your own. Cassandra also supports Type 4 UUIDs by using the UUID type. These are just random bits. There is a collision probability, but the collision probability (assuming uncorrelated random number sources, which it will be if you generate in Java) is extremely low - if you created 1 billion a second for 100 years the probability of one collision is about 50%. (See http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates for more details.)