I'm tying to calculate the AUC using auc(roc(predictions, labels))
, where labels
is a numeric vector of 1
(x15) and 0
(x500), and predictions
is a numeric vector with probabilities derived from a glm
[binomial]. It should be very simple, but auc(roc(predictions, labels))
gives an error saying "Not enough distinct predictions to compute area under the ROC curve". I must be doing something silly, but I can't discover what. Can you?
The code is
library(AUC)
#read the data, that come from a previous process of a species distribution modelling
prob<-read.csv("prob.csv")
labels<-read.csv("labels.csv")
#prob is
#labels is
roc(prob,labels)
#Gives the error (that I'm NOT interest in)
Error in `[.data.frame`(predictions, pred.order) : undefined columns selected
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'
3: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'
#I change the format to numeric vector
prob<-as.numeric(prob[,2])
labels<-as.numeric(labels[,2])
#Verify it is a vector numeric
class(prob)
[1] "numeric"
class(labels)
[1] "numeric"
#call the roc functoin
roc(prob,labels)
Error in roc(modbrapred, pbbra) : # THIS is the error I0m interested in
Not enough distinct predictions to compute area under the ROC curve.
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'
3: In is.na(e2) : is.na() applied to non-(list or vector) of type 'NULL'
Data is as follows
labels.csv
"","x"
"1",1
"2",1
"3",1
"4",1
"5",1
"6",1
...
"164",1
"165",1
"166",0
"167",0
"168",0
"169",0
"170",0
"171",0
"172",0
...
"665",0
prob.csv
"","x"
"1",0.977465874525236
"2",0.989692657762578
"3",0.989692657762578
"4",0.988038430564019
"5",0.443188602491041
"6",0.409732585195485
...
"164",0.988607910625475
"165",0.986296936078692
"166",7.13529696560611e-05
"167",0.000419255989134081
"168",0.00295825183558019
"169",0.00182941235784709
"170",4.85601026999172e-09
"171",0.000953106471289961
"172",1.70252014430306e-05
...
"665",8.13413358866349e-08
The problem was that my "labels" was a numeric vector, but I roc needed a factor. So I transformed
labels <- factor(labels)
and the roc worked as it should
Thanks for the time you dedicated