I have a dataset where each row is a firm, year pair with a firmid
that is a string.
If I do
duplicates drop firmid year, force
it doesn't delete anything since there are no duplicates (I originally created the dataset after running duplicates drop firmid year, force
).
So far so good. I want to create a panel which requires a firmid
that is numeric. So I run
egen newid = group(firmid)
xtset newid year
But the 'repeated time values in panel' error pops up. Moreover,
duplicates list newid year
lists a whole bunch of duplicates.
It seems as though egen, group()
isn't generating unique groups. My question is: why, and how do I create unique groups in a robust way?
This is an old thread, but I have recently experienced the same symptoms, so I wanted to share my solution. Of course, so long as the questioner does not give further details, we will not know whether the causes are the same for me and him.
The problem turned out to be an issue of precision. As explained here in section 4.4, calculations done on integers stored as floats are precise only in the range up to 16,777,216. So, if you have more than 16,777,216 firms in your sample, rounding error will result in the same ID being assigned to multiple firms. This is straightforwardly dealt with by increasing the precision of the ID variable to long:
egen long newid = group(firmid)