Given data of the following form
myDat = structure(list(Score = c(1.84, 2.24, 3.8, 2.3, 3.8, 4.55, 1.13,
2.49, 3.74, 2.84, 3.3, 4.82, 1.74, 2.89, 3.39, 2.08, 3.99, 4.07,
1.93, 2.39, 3.63, 2.55, 3.09, 4.76), Subject = c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L,
7L, 7L, 8L, 8L, 8L), Condition = c(0L, 0L, 0L, 1L, 1L, 1L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L,
1L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Score",
"Subject", "Condition", "Time"), class = "data.frame", row.names = c(NA,
-24L))
I would like to model Score as a function of Subject, Condition and Time. Each (human) Subject's score was measured three times, indicated by the variable Time, so I have repeated measures.
How can I build in R a random effects model with Subject effects fitted as random?
ADDENDUM: It's been asked how I generated these data. You guessed it, the data are fake as the day is long. Score is time plus random noise and being in Condition 1 adds a point to Score. It's instructive as a typical Psych setup. You have a task where people's score gets better with practice (time) and a drug (condition==1) that enhances score.
Here are some more realistic data for the purposes of this discussion. Now simulated participants have a random "skill" level that is added to their scores. Also, the factors are now strings.
myDat = structure(list(Score = c(1.62, 2.18, 2.3, 3.46, 3.85, 4.7, 1.41,
2.21, 3.32, 2.73, 3.34, 3.27, 2.14, 2.73, 2.74, 3.39, 3.59, 4.01,
1.81, 1.83, 3.22, 3.64, 3.51, 4.26), Subject = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L,
6L, 7L, 7L, 7L, 8L, 8L, 8L), .Label = c("A", "B", "C", "D", "E",
"F", "G", "H"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("No", "Yes"), class = "factor"),
Time = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1PM",
"2PM", "3PM"), class = "factor")), .Names = c("Score", "Subject",
"Condition", "Time"), class = "data.frame", row.names = c(NA,
-24L))
See it:
library(ggplot2)
qplot(Time, Score, data = myDat, geom = "line", group = Subject, colour = factor(Condition))
using the nlme library...
Answering your stated question, you can create a random intecept mixed effect model using the following code:
> library(nlme)
> m1 <- lme(Score ~ Condition + Time + Condition*Time,
+ data = myDat, random = ~ 1 | Subject)
> summary(m1)
Linear mixed-effects model fit by REML
Data: myDat
AIC BIC logLik
31.69207 37.66646 -9.846036
Random effects:
Formula: ~1 | Subject
(Intercept) Residual
StdDev: 5.214638e-06 0.3151035
Fixed effects: Score ~ Condition + Time + Condition * Time
Value Std.Error DF t-value p-value
(Intercept) 0.6208333 0.2406643 14 2.579666 0.0218
Condition 0.7841667 0.3403507 6 2.303996 0.0608
Time 0.9900000 0.1114059 14 8.886423 0.0000
Condition:Time 0.0637500 0.1575517 14 0.404629 0.6919
Correlation:
(Intr) Condtn Time
Condition -0.707
Time -0.926 0.655
Condition:Time 0.655 -0.926 -0.707
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.5748794 -0.6704147 0.2069426 0.7467785 1.5153752
Number of Observations: 24
Number of Groups: 8
The intercept variance is basically 0, indicating no within subject effect, so this model is not capturing the between time relationship well. A random intercept model is rarely the type of model you want for a repeated measures design. A random intercept model assumes that the correlations between all time points are equal. i.e. the correlation between time 1 and time 2 is the same as between time 1 and time 3. Under normal circumstances (perhaps not those generating your fake data) we would expect the later to be less than the former. An auto regressive structure is usually a better way to go.
> m2<-gls(Score ~ Condition + Time + Condition*Time,
+ data = myDat, correlation = corAR1(form = ~ Time | Subject))
> summary(m2)
Generalized least squares fit by REML
Model: Score ~ Condition + Time + Condition * Time
Data: myDat
AIC BIC logLik
25.45446 31.42886 -6.727232
Correlation Structure: AR(1)
Formula: ~Time | Subject
Parameter estimate(s):
Phi
-0.5957973
Coefficients:
Value Std.Error t-value p-value
(Intercept) 0.6045402 0.1762743 3.429543 0.0027
Condition 0.8058448 0.2492895 3.232566 0.0042
Time 0.9900000 0.0845312 11.711652 0.0000
Condition:Time 0.0637500 0.1195452 0.533271 0.5997
Correlation:
(Intr) Condtn Time
Condition -0.707
Time -0.959 0.678
Condition:Time 0.678 -0.959 -0.707
Standardized residuals:
Min Q1 Med Q3 Max
-1.6850557 -0.6730898 0.2373639 0.8269703 1.5858942
Residual standard error: 0.2976964
Degrees of freedom: 24 total; 20 residual
Your data is showing a -.596 between time point correlation, which seems odd. normally there should, at a minimum be a positive correlation between time points. How was this data generated?
addendum:
With your new data we know that the data generating process is equivalent to a random intercept model (though that is not the most realistic for a longitudinal study. The visualization shows that the effect of time seems to be fairly linear, so we should feel comfortable treating it as a numeric variable.
> library(nlme)
> m1 <- lme(Score ~ Condition + as.numeric(Time) + Condition*as.numeric(Time),
+ data = myDat, random = ~ 1 | Subject)
> summary(m1)
Linear mixed-effects model fit by REML
Data: myDat
AIC BIC logLik
38.15055 44.12494 -13.07527
Random effects:
Formula: ~1 | Subject
(Intercept) Residual
StdDev: 0.2457355 0.3173421
Fixed effects: Score ~ Condition + as.numeric(Time) + Condition * as.numeric(Time)
Value Std.Error DF t-value p-value
(Intercept) 1.142500 0.2717382 14 4.204415 0.0009
ConditionYes 1.748333 0.3842958 6 4.549447 0.0039
as.numeric(Time) 0.575000 0.1121974 14 5.124898 0.0002
ConditionYes:as.numeric(Time) -0.197500 0.1586710 14 -1.244714 0.2337
Correlation:
(Intr) CndtnY as.(T)
ConditionYes -0.707
as.numeric(Time) -0.826 0.584
ConditionYes:as.numeric(Time) 0.584 -0.826 -0.707
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.44560367 -0.65018585 0.01864079 0.52930925 1.40824838
Number of Observations: 24
Number of Groups: 8
We see a significant Condition effect, indicating that the 'yes' condition tends to have higher scores (by about 1.7), and a significant time effect, indicating that both groups go up over time. Supporting the plot, we find no differential effect of time between the two groups (the interaction). i.e. the slopes are the same.