Stata: using egen group() to create unique identifiers

AsianYayaToure picture AsianYayaToure · Mar 26, 2014 · Viewed 20.3k times · Source

I have a dataset where each row is a firm, year pair with a firmid that is a string.

If I do

duplicates drop firmid year, force

it doesn't delete anything since there are no duplicates (I originally created the dataset after running duplicates drop firmid year, force).

So far so good. I want to create a panel which requires a firmid that is numeric. So I run

egen newid = group(firmid)
xtset newid year

But the 'repeated time values in panel' error pops up. Moreover,

duplicates list newid year

lists a whole bunch of duplicates.

It seems as though egen, group() isn't generating unique groups. My question is: why, and how do I create unique groups in a robust way?

Answer

jphaller picture jphaller · Nov 12, 2014

This is an old thread, but I have recently experienced the same symptoms, so I wanted to share my solution. Of course, so long as the questioner does not give further details, we will not know whether the causes are the same for me and him.

The problem turned out to be an issue of precision. As explained here in section 4.4, calculations done on integers stored as floats are precise only in the range up to 16,777,216. So, if you have more than 16,777,216 firms in your sample, rounding error will result in the same ID being assigned to multiple firms. This is straightforwardly dealt with by increasing the precision of the ID variable to long:

egen long newid = group(firmid)