Understanding num_classes for xgboost in R

House picture House · Mar 18, 2016 · Viewed 13.4k times · Source

I'm having a lot of trouble figuring out how to correctly set the num_classes for xgboost.

I've got an example using the Iris data

df <- iris

y <- df$Species
num.class = length(levels(y))
levels(y) = 1:num.class
head(y)

df <- df[,1:4]

y <- as.matrix(y)
df <- as.matrix(df)

param <- list("objective" = "multi:softprob",    
          "num_class" = 3,    
          "eval_metric" = "mlogloss",    
          "nthread" = 8,   
          "max_depth" = 16,   
          "eta" = 0.3,    
          "gamma" = 0,    
          "subsample" = 1,   
          "colsample_bytree" = 1,  
          "min_child_weight" = 12)

model <- xgboost(param=param, data=df, label=y, nrounds=20)

This returns an error

Error in xgb.iter.update(bst$handle, dtrain, i - 1, obj) : 
SoftmaxMultiClassObj: label must be in [0, num_class), num_class=3 but found 3 in label

If I change the num_class to 2 I get the same error. If I increase the num_class to 4 then the model runs, but I get 600 predicted probabilities back, which makes sense for 4 classes.

I'm not sure if I'm making an error or whether I'm failing to understand how xgboost works. Any help would be appreciated.

Answer

RustamA picture RustamA · Mar 18, 2016

label must be in [0, num_class) in your script add y<-y-1 before model <-...