I build my model for prediction with XGBoost:
setDT(train)
setDT(test)
labels <- train$Goal
ts_label <- test$Goal
new_tr <- model.matrix(~.+0,data = train[,-c("Goal"),with=F])
new_ts <- model.matrix(~.+0,data = test[,-c("Goal"),with=F])
labels <- as.numeric(labels)-1
ts_label <- as.numeric(ts_label)-1
dtrain <- xgb.DMatrix(data = new_tr,label = labels)
dtest <- xgb.DMatrix(data = new_ts,label=ts_label)
params <- list(booster = "gbtree", objective = "binary:logistic", eta=0.3, gamma=0, max_depth=6, min_child_weight=1, subsample=1, colsample_bytree=1)
xgb1 <- xgb.train(params = params, data = dtrain, nrounds = 291, watchlist = list(val=dtest,train=dtrain), print_every_n = 10,
early_stop_round = 10, maximize = F , eval_metric = "error")
xgbpred <- predict(xgb1,dtest)
xgbpred <- ifelse(xgbpred > 0.5,1,0)
confusionMatrix(xgbpred, ts_label)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 1904 70
1 191 2015
Accuracy : 0.9376
95% CI : (0.9298, 0.9447)
No Information Rate : 0.5012
P-Value [Acc > NIR] : < 0.00000000000000022
Kappa : 0.8751
Mcnemar's Test P-Value : 0.0000000000001104
Sensitivity : 0.9088
Specificity : 0.9664
Pos Pred Value : 0.9645
Neg Pred Value : 0.9134
Prevalence : 0.5012
Detection Rate : 0.4555
Detection Prevalence : 0.4722
Balanced Accuracy : 0.9376
'Positive' Class : 0
This accuracy suits me, but I want to check the metric of auc. I write:
xgb1 <- xgb.train(params = params, data = dtrain, nrounds = 291, watchlist = list(val=dtest,train=dtrain), print_every_n = 10,
early_stop_round = 10, maximize = F , eval_metric = "auc")
But after that i don't know how to make a prediction concerning AUC metrics. I need your help, because its my first experience with XGBoost. Thanks.
UPD: As far as I understand, after the auc metric I need a coefficient that I will cut classes. Now I cut off in 0,5
You can see your AUC value of the trained model for the training data set with following
> max(xgb1$evaluation_log$train_auc)
Also you can calculate it for your predictions on test set with pROC package as follows
> library(pROC)
> roc_test <- roc( test_label_vec, predictions_for_test, algorithm = 2)
for your code written with your parameters it is
> roc_test <- roc(ts_label, xgbpred, algorithm = 2)
> plot(roc_test )
> auc(roc_test )
if you want to calculate AUC and plot ROC curve for your training set, you can use following
> roc_training <- roc(train_output_vec, train_predictions, algorithm = 2)
> plot(roc_training )
> auc(roc_training)
ROC curve and AUC does not need to consider the cutoff point. ROC is being drawn and AUC is calculated sorting the prediction scores and seeing what % of target events are found in the prediction set. So, it is checking what % of target events you could find if you move the cutoff point. The decision of the cutoff point is related to costs, or application of the algorithm. You can make a search on cutoff to get more info on this.