Is there a reason why there are two different commands to generate a new variable?
Is there a simple way to remember when to use gen
and when to use egen
?
They both create a new variable, but work with different sets of functions. You will typically use gen
when you have simple transformations of other variables in your dataset like
gen newvar = oldvar1^2 * oldvar2
In my workflow, egen
usually appears when I need functions that work across all observations, like in
egen max_var = max(var)
or more complex instructions
egen newvar = rowmax(oldvar1 oldvar2)
to calculate the maximum for each observation between oldvar1
and oldvar2
. I don't think there is a clear logic for separating the two commands.