Should year variable be factor or numeric in panel data in R?

user3571389 picture user3571389 · Oct 27, 2014 · Viewed 8k times · Source

I have a panel dataset where hospitals are followed over time from 2004 to 2010 every two years. The data is in Stata but I take it to R. Initially the variables year (2004, 2006, 2008, 2010) and t (1=2004, 2=2006 and so on) are in integer but later I convert them into factors as follows:

data$year <- factor(data$year)

and similarly for t time variable as well.

But I am confused and my question is as to whether take year or t as an integer or numeric variable or convert it to factor for the panel data and whether the above command is the right way to convert into a factor?

Answer

Love-R picture Love-R · Oct 27, 2014

Treating year as a categorical variable will calculate effect of each indivudal year - i.e. what impact on the target variable was in average in a given year. On the other hand, including t as numerical variable says what happens on average two years later. Given that there are just 4 time periods, the first approach seems more reasonable, but it really depends on the goal of our analysis.

The command should be

data$year <- as.factor(data$year).

Also, make sure that You include only one of year or t as including both could screw up the interpretation.