I am using glmnet to predict probabilities based on a set of 5 features using the following code. I need the actual formula because I need to use it in a different (non R) program.
deg = 3
glmnet.fit <- cv.glmnet(poly(train.matrix,degree=deg),train.result,alpha=0.05,family='binomial')
The names of the resulting coefficients have five positions (I assume this is one of each feature) and each one of them is a number between 0 and 3 (I assume this is the degree of the polynomial). But I am still confused about how exactly to reconstruct the formula.
Take these for example:
> coef(glmnet.fit,s= best.lambda)
(Intercept) -2.25e-01
...
0.1.0.0.1 3.72e+02
1.1.0.0.1 9.22e+04
0.2.0.0.1 6.17e+02
...
Let's call the features A,B,C,D,E. Is this how the formula should be interpreted?
Y =
-2.25e-01 +
...
(3.72e+02 * (B * E) +
(9.22e+04 * (A * B * E) +
(6.17e+02 * (B^2 + E)
...
If that is not correct how should I interpret it?
I saw the following question and answer but it didn't address these types of coefficient names.
Thanks in advance for your help.
Usually, we use the predict function. In your case, you need the coefficients to use in another program. We can check the agreement between using predict and the result of multiplying the data by the coefficients.
# example data
library(ElemStatLearn)
library(glmnet)
data(prostate)
# training data
data.train <- prostate[prostate$train,]
y <- data.train$lpsa
# isolate predictors
data.train <- as.matrix(data.train[,-c(9,10)])
# test data
data.test <- prostate[!prostate$train,]
data.test <- as.matrix(data.test[,-c(9,10)])
# fit training model
myglmnet =cv.glmnet(data.train,y)
# predictions by using predict function
yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min")
# get predictions by using coefficients
beta <- as.vector( t(coef(myglmnet,s="lambda.min")))
# Coefficients are returned on the scale of the original data.
# note we need to add column of 1s for intercept
testX <- cbind(1,data.test)
yhat2 <- testX %*% beta
# check by plotting predictions
plot(yhat2,yhat_enet)
So each coefficient corresponds to a column in your training data. The first one corresponds to the intercept. In sum, you can extract the coefficients and multiply by the test data to obtain the outcomes you are interested in.