How can I prevent rbind() from geting really slow as dataframe grows larger?

Mark picture Mark · Feb 4, 2013 · Viewed 11.7k times · Source

I have a dataframe with only 1 row. To this I start to add rows by using rbind

df #mydataframe with only one row
for (i in 1:20000)
{
    df<- rbind(df, newrow)

}

this gets very slow as i grows. Why is that? and how can I make this type of code faster?

Answer

joran picture joran · Feb 4, 2013

You are in the 2nd circle of hell, namely failing to pre-allocate data structures.

Growing objects in this fashion is a Very Very Bad Thing in R. Either pre-allocate and insert:

df <- data.frame(x = rep(NA,20000),y = rep(NA,20000))

or restructure your code to avoid this sort of incremental addition of rows. As discussed at the link I cite, the reason for the slowness is that each time you add a row, R needs to find a new contiguous block of memory to fit the data frame in. Lots 'o copying.