Plotting of very large data sets in R

Daniel Arndt picture Daniel Arndt · Dec 3, 2010 · Viewed 16k times · Source

How can I plot a very large data set in R?

I'd like to use a boxplot, or violin plot, or similar. All the data cannot be fit in memory. Can I incrementally read in and calculate the summaries needed to make these plots? If so how?

Answer

mbq picture mbq · Dec 3, 2010

In supplement to my comment to Dmitri answer, a function to calculate quantiles using ff big-data handling package:

ffquantile<-function(ffv,qs=c(0,0.25,0.5,0.75,1),...){
 stopifnot(all(qs<=1 & qs>=0))
 ffsort(ffv,...)->ffvs
 j<-(qs*(length(ffv)-1))+1
 jf<-floor(j);ceiling(j)->jc
 rowSums(matrix(ffvs[c(jf,jc)],length(qs),2))/2
}

This is an exact algorithm, so it uses sorting -- and thus may take a lot of time.