Downloading large files with R/RCurl efficiently

antonio picture antonio · Jan 20, 2013 · Viewed 9.1k times · Source

I see that many examples for downloading binary files with RCurl are like such:

library("RCurl")
curl = getCurlHandle()
bfile=getBinaryURL (
        "http://www.example.com/bfile.zip",
        curl= curl,
        progressfunction = function(down, up) {print(down)}, noprogress = FALSE
)
writeBin(bfile, "bfile.zip")
rm(curl, bfile)

If the download is very large, I suppose it would be better writing it concurrently to the storage medium, instead of fetching all in memory.

In RCurl documentation there are some examples to get files by chunks and manipulate them as they are downloaded, but they seem all referred to text chunks.

Can you give a working example?

UPDATE

A user suggests using the R native download file with mode = 'wb' option for binary files.

In many cases the native function is a viable alternative, but there are a number of use-cases where this native function does not fit (https, cookies, forms etc.) and this is the reason why RCurl exists.

Answer

antonio picture antonio · Mar 21, 2013

This is the working example:

library(RCurl)
#
f = CFILE("bfile.zip", mode="wb")
curlPerform(url = "http://www.example.com/bfile.zip", writedata = f@ref)
close(f)

It will download straight to file. The returned value will be (instead of the downloaded data) the status of the request (0, if no errors occur).

Mention to CFILE is a bit terse on RCurl manual. Hopefully in the future it will include more details/examples.

For your convenience the same code is packaged as a function (and with a progress bar):

bdown=function(url, file){
    library('RCurl')
    f = CFILE(file, mode="wb")
    a = curlPerform(url = url, writedata = f@ref, noprogress=FALSE)
    close(f)
    return(a)
}

## ...and now just give remote and local paths     
ret = bdown("http://www.example.com/bfile.zip", "path/to/bfile.zip")