I have a panel dataset where hospitals are followed over time from 2004 to 2010 every two years. The data is in Stata but I take it to R. Initially the variables year
(2004, 2006, 2008, 2010) and t
(1=2004, 2=2006 and so on) are in integer but later I convert them into factors as follows:
data$year <- factor(data$year)
and similarly for t time variable as well.
But I am confused and my question is as to whether take year
or t
as an integer or numeric variable or convert it to factor for the panel data and whether the above command is the right way to convert into a factor?
Treating year
as a categorical variable will calculate effect of each indivudal year - i.e. what impact on the target variable was in average in a given year. On the other hand, including t
as numerical variable says what happens on average two years later. Given that there are just 4 time periods, the first approach seems more reasonable, but it really depends on the goal of our analysis.
The command should be
data$year <- as.factor(data$year)
.
Also, make sure that You include only one of year
or t
as including both could screw up the interpretation.