"Error: Must subset rows with a valid subscript vector" in preProcess() when using knnImpute

r na knn
Chris T. picture Chris T. · May 29, 2020 · Viewed 10.5k times · Source

I'm using kaggle's pokemon data to practice KNN imputation via preProcess(), but when I did I encountered this following message after the predict() step. I am wondering if I use the incorrect data format or if some columns have inappropriate "class." Below is my code.

and I received this error message

Error: Must subset rows with a valid subscript vector.
x Subscript `nn$nn.idx` must be a simple vector, not a matrix.
Run `rlang::last_error()` to see where the error occurred.

Answer

Dominik S. Meier picture Dominik S. Meier · May 29, 2020

The input for preProcess needs to be a data.frame. This works:

pre_process_missing_data <- preProcess(as.data.frame(df), method="knnImpute")

classify_legendary <- predict(pre_process_missing_data, newdata = df)
classify_legendary 

> classify_legendary
# A tibble: 801 x 6
       hp weight_kg height_m sp_attack sp_defense capture_rate
    <dbl>     <dbl>    <dbl>     <dbl>      <dbl> <chr>       
 1 -0.902    -0.498  -0.429     -0.195     -0.212 45          
 2 -0.337    -0.442  -0.152      0.269      0.325 45          
 3  0.415     0.353   0.774      1.57       1.76  45          
 4 -1.13     -0.484  -0.522     -0.349     -0.748 45          
 5 -0.412    -0.388  -0.0591     0.269     -0.212 45          
 6  0.340     0.266   0.496      2.71       1.58  45          
 7 -0.939    -0.479  -0.615     -0.659     -0.247 45          
 8 -0.375    -0.356  -0.152     -0.195      0.325 45          
 9  0.378     0.221   0.404      1.97       1.58  45          
10 -0.902    -0.535  -0.800     -1.59      -1.82  255         
# ... with 791 more rows