Loading .RData files into Python

Stu picture Stu · Jan 22, 2014 · Viewed 57.2k times · Source

I have a bunch of .RData time-series files and would like to load them directly into Python without first converting the files to some other extension (such as .csv). Any ideas on the best way to accomplish this?

Answer

Otto Fajardo picture Otto Fajardo · Dec 27, 2018

As an alternative for those who would prefer not having to install R in order to accomplish this task (r2py requires it), there is a new package "pyreadr" which allows reading RData and Rds files directly into python without dependencies.

It is a wrapper around the C library librdata, so it is very fast.

You can install it easily with pip:

pip install pyreadr

As an example you would do:

import pyreadr

result = pyreadr.read_r('/path/to/file.RData') # also works for Rds

# done! let's see what we got
# result is a dictionary where keys are the name of objects and the values python
# objects
print(result.keys()) # let's check what objects we got
df1 = result["df1"] # extract the pandas data frame for object df1

The repo is here: https://github.com/ofajardo/pyreadr

Disclaimer: I am the developer of this package.