Asterisk (*) vs. colon (:) in R formulas

Leo Ohyama picture Leo Ohyama · Nov 12, 2016 · Viewed 9.3k times · Source

I always thought that * and : meant the same thing when adding interaction terms in R formulas. For example:

  • amount_of_gas ~ temperature*gas_type
  • amount_of_gas ~ temperature:gas_type

However, now that I've started using Generalized Linear Models (glm() in R) I see that these generate different scores, different estimates, etc. when I switch between the two. Can someone explain to me why this happens? Is it a problem with the stats package in R?

Answer

Dirk Eddelbuettel picture Dirk Eddelbuettel · Nov 12, 2016

From help(formula):

 In addition to ‘+’ and ‘:’, a number of other operators are useful
 in model formulae.  The ‘*’ operator denotes factor crossing:
 ‘a*b’ interpreted as ‘a+b+a:b’.