Deciding threshold for glm logistic regression model in R

Question 1

Deciding threshold for glm logistic regression model in R

r glm predict logistic-regression

user2175594 · Apr 23, 2014 · Viewed 29.2k times · Source

Answer

Answer

The best threshold (or cutoff) point to be used in glm models is the point which maximises the specificity and the sensitivity. This threshold point might not give the highest prediction in your model, but it wouldn't be biased towards positives or negatives. The ROCR package contain functions that can help you do this. check the performance() function in this package. It is going to get you what you're looking for. Here's a picture of what you are expecting to get:

enter image description here

After finding the cutoff point, I normally write a function myself to find the number of datapoints that has their prediction value above the cutoff, and match it with the group they belong to.

Question 2

I have some data with predictors and a binary target. Eg:

df <- data.frame(a=sort(sample(1:100,30)), b= sort(sample(1:100,30)), 
                 target=c(rep(0,11),rep(1,4),rep(0,4),rep(1,11)))

I trained a logistic regresion model using glm()

model1 <- glm(formula= target ~ a + b, data=df, family=binomial)

Now I'm trying to predict the output (for the example, the same data should suffice)

predict(model1, newdata=df, type="response")

This generates a vector of probability numbers. But I want to predict the actual class. I could use round() on the probablity numbers, but this assumes that anything below 0.5 is class '0', and anything above is class '1'. Is this a correct assumption? Even when the population of each class may not be equal (or close to equal)? Or is there a way to estimate this threshold?

Deciding threshold for glm logistic regression model in R

Answer

Related questions