I have read about other people's problems with memory use, but none of them help with the following issue. I have a 64 bit R environment with 16GB of RAM.
Import .gz file of size 229MB. (the unzipped version is 921MB)
accepted_def <- read.csv(gzfile('accepted_2007_to_2017.csv.gz'),
na.strings='')
acc_dt <- as.data.table(accepted_def)
By this time, my R Studio R session memory use has gone from about 100MB to 3GB
Clean up data and get rid of unnecessary features:
library(dplyr)
df.train <- select(acc_dt,
-1,-2,-10,-11,-16,-19,-21,-22,-23,-26,-46,-48,-49)
Finally, attempt to impute missing values with MICE:
library(mice)
df.new = as.data.frame(mice(df.train, m=1, method = 'cart', printFlag=F))
Now, my R session memory use jumps to over 12GB and I get the following:
Error: cannot allocate vector of size 11.6 Mb
Any ideas on what is going on would be appreciated!