Plotting an ROC curve in glmnet

Dr. Beeblebrox picture Dr. Beeblebrox · Aug 8, 2013 · Viewed 11.1k times · Source

EDIT: As Dwin pointed out in the comments, the code below is not for an ROC curve. An ROC curve must be indexed in variation in t and not in lambda (as I do below). I will edit the code below when I get the chance.

Below is my attempt to create an ROC curve of glmnet predicting a binary outcome. I've simulated a matrix that approximates glmnet results in the code below. As some of you know, given an n x p matrix of inputs, glmnet outputs an n x 100 matrix of predicted probabilities [$\Pr(y_i = 1)$] for 100 different values of lambda. The output will be narrower than 100 if further changes in lambda stop increasing predictive power. The simulated matrix of glmnet predicted probabilities below is a 250x69 matrix.

First, is there an easier way to plot a glmnet ROC curve? Second, if not, does the below approach seem correct? Third, do I care about plotting (1) the probability of false/true positives OR (2) simply the observed rate of false/true positives?

set.seed(06511)

# Simulate predictions matrix
phat = as.matrix(rnorm(250,mean=0.35, sd = 0.12))
lambda_effect = as.matrix(seq(from = 1.01, to = 1.35, by = 0.005))
phat = phat %*% t(lambda_effect)


#Choose a cut-point
t = 0.5

#Define a predictions matrix
predictions = ifelse(phat >= t, 1, 0)

##Simulate y matrix
y_phat = apply(phat, 1, mean) + rnorm(250,0.05,0.10)
y_obs = ifelse(y_phat >= 0.55, 1, 0)

#percentage of 1 observations in the validation set, 
p = length(which(y_obs==1))/length(y_obs)

#   dim(testframe2_e2)

#probability of the model predicting 1 while the true value of the observation is 0, 
apply(predictions, 1, sum)

## Count false positives for each model
## False pos ==1, correct == 0, false neg == -1
error_mat = predictions - y_obs
## Define a matrix that isolates false positives
error_mat_fp = ifelse(error_mat ==1, 1, 0)
false_pos_rate = apply(error_mat_fp, 2,  sum)/length(y_obs)

# Count true positives for each model
## True pos == 2, mistakes == 1, true neg == 0
error_mat2 = predictions + y_obs
## Isolate true positives
error_mat_tp = ifelse(error_mat2 ==2, 1, 0)
true_pos_rate = apply(error_mat_tp, 2,  sum)/length(y_obs)


## Do I care about (1) this probability OR (2) simply the observed rate?
## (1)
#probability of false-positive, 
p_fp = false_pos_rate/(1-p)
#probability of true-positive, 
p_tp = true_pos_rate/p

#plot the ROC, 
plot(p_fp, p_tp)


## (2)
plot(false_pos_rate, true_pos_rate)

There's one question on this on SO, but the answer was rough and not quite right: glmnet lasso ROC charts

Answer

NiuBiBang picture NiuBiBang · Aug 4, 2014

An option that uses ROCR to calculate AUC & plot ROC curve:

library(ROCR)
library(glmnet)
library(caret)

df <- data.matrix(… ) # dataframe w/ predictor variables & a response variable
                      # col1 = response var; # cols 2:10 = predictor vars

# Create training subset for model development & testing set for model performance testing
inTrain <- createDataPartition(df$ResponsVar, p = .75, list = FALSE)
Train <- df[ inTrain, ]
Test <- df[ -inTrain, ]

# Run model over training dataset
lasso.model <- cv.glmnet(x = Train[,2:10], y = Train[,1], 
                         family = 'binomial', type.measure = 'auc')

# Apply model to testing dataset
Test$lasso.prob <- predict(lasso.model,type="response", 
                           newx = Test[,2:10], s = 'lambda.min')
pred <- prediction(Test$lasso.prob, Test$ResponseVar)

# calculate probabilities for TPR/FPR for predictions
perf <- performance(pred,"tpr","fpr")
performance(pred,"auc") # shows calculated AUC for model
plot(perf,colorize=FALSE, col="black") # plot ROC curve
lines(c(0,1),c(0,1),col = "gray", lty = 4 )

For the Test$lasso.prob above, you could enter different lambdas to test the predictive power at each value.