Getting invalid model formula in ExtractVars when using rpart function in R

dgene54 picture dgene54 · Jan 14, 2015 · Viewed 37.9k times · Source

The dataset can be downloaded from http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

Getting the following error:

formula(formula, data = data) : 
  invalid model formula in ExtractVars

Using the following code:

install.packages("rpart")
library("rpart")

# you'll need to change the following from windows to work on a linux box:
mydata <- read.csv(file="c:/Users/md7968/downloads/winequality-red.csv")

# grow tree 
fit <- rpart(YouSweetBoy ~ "residual sugar" + "citric acid", method = "class", data = mydata

Mind you I've changed the delimiters in the CSV file to commas.

perhaps it's not reading the data correctly. Forgive me, I'm new to R and not a very good programmer.

Answer

MrFlick picture MrFlick · Jan 14, 2015

Look at names(mydata). When you create a data.frame, read.table() will turn "bad" column names into good ones. You can't (well, shouldn't) have a space in a column name so R changes spaces to periods. Plus, you should never have quoted strings in a formula. Try

fit <- rpart(quality ~ residual.sugar + citric.acid, method = "class", data = mydata)

(I have no idea what "YouSweetBoy" was supposed to be since that wasn't in the dataset so i changed it to "quality").