Extracting coefficient variable names from glmnet into a data.frame

David Eborall picture David Eborall · Jan 6, 2015 · Viewed 37.3k times · Source

I would like to extract the glmnet generated model coefficients and create a SQL query from them. The function coef(cv.glmnet.fit) yields a 'dgCMatrix' object. When I convert it to a matrix using as.matrix, the variable names are lost and only the coefficient values are left behind.

I know one can print the coefficients in the screen, however is it possible to write the names to a data frame?

Can anybody assist to extract these names?

Answer

Mehrad Mahmoudian picture Mehrad Mahmoudian · Jan 15, 2015

UPDATE: Both first two comments of my answer are right. I have kept the answer below the line just for posterity.

The following answer is short, it works and does not need any other package:

tmp_coeffs <- coef(cv.glmnet.fit, s = "lambda.min")
data.frame(name = tmp_coeffs@Dimnames[[1]][tmp_coeffs@i + 1], coefficient = tmp_coeffs@x)

The reason for +1 is that the @i method indexes from 0 for the intercept but @Dimnames[[1]] starts at 1.


OLD ANSWER: (only kept for posterity) Try these lines:

The non zero coefficients:

coef(cv.glmnet.fit, s = "lambda.min")[which(coef(cv.glmnet.fit, s = "lambda.min") != 0)]

The features that are selected:

colnames(regression_data)[which(coef(cv.glmnet.fit, s = "lambda.min") != 0)]

Then putting them together as a dataframe is staight forward, but let me know if you want that part of the code also.