NaiveBayes in R Cannot Predict - factor(0) Levels:

B.Mr.W. picture B.Mr.W. · Nov 13, 2013 · Viewed 15.5k times · Source

I have a dataset looks like this:

data.flu <- data.frame(chills = c(1,1,1,0,0,0,0,1), runnyNose = c(0,1,0,1,0,1,1,1), headache = c("M", "N", "S", "M", "N", "S", "S", "M"), fever = c(1,0,1,1,0,1,0,1), flu = c(0,1,1,1,0,1,0,1) )
> data.flu
   chills runnyNose headache fever flu
1      1         0        M     1   0
2      1         1        N     0   1
3      1         0        S     1   1
4      0         1        M     1   1
5      0         0        N     0   0
6      0         1        S     1   1
7      0         1        S     0   0
8      1         1        M     1   1

> str(data.flu)
'data.frame':   8 obs. of  5 variables:
 $ chills   : num  1 1 1 0 0 0 0 1
 $ runnyNose: num  0 1 0 1 0 1 1 1
 $ headache : Factor w/ 3 levels "M","N","S": 1 2 3 1 2 3 3 1
 $ fever    : num  1 0 1 1 0 1 0 1
 $ flu      : num  0 1 1 1 0 1 0 1

Why predict function returns me nothing?

# I can see the model has been successfully created.
model <- naiveBayes(flu~., data=data.flu)
# I created a new data 
patient <- data.frame(chills = c(1), runnyNose = c(0), headache = c("M"), fever = c(1))
> predict(model, patient)
factor(0)
Levels:
# I tried with the training data, still won't work
> predict(model, data.flu[,-5])
factor(0)
Levels:

I tried following the examples in the help manual in naiveBayes and it works for me. I am not sure what is wrong with my approach. Thanks a lot!

I think there might be something wrong with the data type before applying the naivebayes model, I tried to change all the variables to factor using as.factor and it seems like working for me. But I am still super confused what is the 'How' and 'Why' behind the scene.

Answer

Didzis Elferts picture Didzis Elferts · Nov 13, 2013

Problem isn't in the predict() function but in your model definition.

Help file of naiveBayes() says:

Computes the conditional a-posterior probabilities of a categorical class variable 
given independent predictor variables using the Bayes rule.

So y values should be categorical but in your case they are numeric.

Solution is to convert flu to factor.

model <- naiveBayes(as.factor(flu)~., data=data.flu)
predict(model, patient)
[1] 1
Levels: 0 1