R: In anova.lm(g) : ANOVA F-tests on an essentially perfect fit are unreliable

Travis picture Travis · Dec 18, 2011 · Viewed 8.3k times · Source

I am pairing up online guides with an old text to learn R (page 182 - http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf). When I use data from a package from R (as in the tutorial examples) there is no problem. However, when I use data from my text, I always end with no F-value and the warning.

Take a look:

data into a data.frame:

car.noise <- data.frame( speed = c("idle", "0-60mph", "over 60"), chrysler = c(41,65,76), 
bmw = c(45,67,72), ford = c(44,66,76), chevy = c(45,66,77), subaru = c(46,76,64))

check the data.frame:

car.noise
    speed chrysler bmw ford chevy subaru
1    idle       41  45   44    45     46
2 0-60mph       65  67   66    66     76
3 over 60       76  72   76    77     64

melt data.frame:

mcar.noise<- melt(car.noise, id.var="speed")

check melted data.frame

> mcar.noise
     speed variable value
1     idle chrysler    41
2  0-60mph chrysler    65
3  over 60 chrysler    76
4     idle      bmw    45
5  0-60mph      bmw    67
6  over 60      bmw    72
7     idle     ford    44
8  0-60mph     ford    66
9  over 60     ford    76
10    idle    chevy    45
11 0-60mph    chevy    66
12 over 60    chevy    77
13    idle   subaru    46
14 0-60mph   subaru    76
15 over 60   subaru    64

perform anova and get warning:

> anova(lm(value ~ variable * speed, mcar.noise))
Analysis of Variance Table

Response: value 
               Df  Sum Sq Mean Sq F value Pr(>F)
variable        4    6.93    1.73               
speed           2 2368.13 1184.07               
variable:speed  8  205.87   25.73               
Residuals       0    0.00                       
Warning message:
In anova.lm(lm(value ~ variable * speed, mcar.noise)) :
  ANOVA F-tests on an essentially perfect fit are unreliable

The only 2 explanations I can come up with:

1: I am coding incorrectly 2: Text examples are too 'perfect' of a fit since they are trying to show clear example

Answer

Dason picture Dason · Dec 18, 2011

You are trying to fit a model that gives a separate mean to every combination of variable*speed. With the data you have that means you don't have any replication at all. It would be like trying to compare two groups when you only have a single value from each group.

If you look at the line for "Residuals" in your anova table you should notice that you don't have any degrees of freedom there and your sums of squares are 0 as well. You could try to fit a model without an interaction if you feel it is appropriate but you don't have enough data to fit a model with an interaction.