What does the R formula y~1 mean?

Antony picture Antony · Nov 13, 2012 · Viewed 26.2k times · Source

I was reading the documentation on R Formula, and trying to figure out how to work with depmix (from the depmixS4 package).

Now, in the documentation of depmixS4, sample formula tends to be something like y ~ 1. For simple case like y ~ x, it is defining a relationship between input x and output y, so I get that it is similar to y = a * x + b, where a is the slope, and b is the intercept.

If we go back to y ~ 1, the formula is throwing me off. Is it equivalent to y = 1 (a horizontal line at y = 1)?

To add a bit context, if you look at the depmixs4 documentation, there is one example below

depmix(list(rt~1,corr~1),data=speed,nstates=2,family=list(gaussian(),multinomial()))

I think in general, formula that end with ~ 1 is confusing to me. Can any explain what ~ 1 or y ~ 1 mean? Thanks a bunch!

Answer

MattBagg picture MattBagg · Nov 13, 2012

Many of the operators used in model formulae (asterix, plus, caret) in R, have a model-specific meaning and this is one of them: the 'one' symbol indicates an intercept.

In other words, it is the value the dependent variable is expected to have when the independent variables are zero or have no influence. (To use the more common mathematical meaning of model terms, you wrap them in I()). Intercepts are usually assumed so it is most common to see it in the context of explicitly stating a model without an intercept.

Here are two ways of specifying the same model for a linear regression model of y on x. The first has an implicit intercept term, and the second an explicit one:

y ~ x
y ~ 1 + x

Here are ways to give a linear regression of y on x through the origin (that is, without an intercept term):

y ~ 0 + x
y ~ -1 + x
y ~ x - 1

In the specific case you mention ( y ~ 1 ), y is being predicted by no other variable so the natural prediction is the mean of y, as Paul Hiemstra stated:

> data(city)
> r <- lm(x~1, data=city)
> r

Call:
lm(formula = x ~ 1, data = city)

Coefficients:
(Intercept)  
       97.3  

> mean(city$x)
[1] 97.3

And removing the intercept with a -1 leaves you with nothing:

> r <- lm(x ~ -1, data=city)
> r

Call:
lm(formula = x ~ -1, data = city)

No coefficients

formula() is a function for extracting formula out of objects and its help file isn't the best place to read about specifying model formulae in R. I suggest you look at this explanation or Chapter 11 of An Introduction to R.