I am performing logistic regression using this page. My code is as below.
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mylogit <- glm(admit ~ gre, data = mydata, family = "binomial")
summary(mylogit)
prob=predict(mylogit,type=c("response"))
mydata$prob=prob
After running this code mydata dataframe has two columns - 'admit' and 'prob'. Shouldn't those two columns sufficient to get the ROC curve?
How can I get the ROC curve.
Secondly, by loooking at mydata, it seems that model is predicting probablity of admit=1
.
Is that correct?
How to find out which particular event the model is predicting?
Thanks
UPDATE: It seems that below three commands are very useful. They provide the cut-off which will have maximum accuracy and then help to get the ROC curve.
coords(g, "best")
mydata$prediction=ifelse(prob>=0.3126844,1,0)
confusionMatrix(mydata$prediction,mydata$admit
The ROC curve compares the rank of prediction and answer. Therefore, you could evaluate the ROC curve with package pROC
as follow:
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mylogit <- glm(admit ~ gre, data = mydata, family = "binomial")
summary(mylogit)
prob=predict(mylogit,type=c("response"))
mydata$prob=prob
library(pROC)
g <- roc(admit ~ prob, data = mydata)
plot(g)