How to deal with hdf5 files in R?

Sam picture Sam · Apr 12, 2013 · Viewed 48k times · Source

I have a file in hdf5 format. I know that it is supposed to be a matrix, but I want to read that matrix in R so that I can study it. I see that there is a h5r package that is supposed to help with this, but I do not see any simple to read/understand tutorial. Is such a tutorial available online. Specifically, How do you read a hdf5 object with this package, and how to actually extract the matrix?

UPDATE

I found out a package rhdf5 which is not part of CRAN but is part of BioConductoR. The interface is relatively easier to understand the the documentation and example code is quite clear. I could use it without problems. My problem it seems was the input file. The matrix that I wanted to read was actually stored in the hdf5 file as a python pickle. So every time I tried to open it and access it through R i got a segmentation fault. I did figure out how to save the matrix from within python as a tsv file and now that problem is solved.

Answer

Mike T picture Mike T · Oct 21, 2013

The rhdf5 package works really well, although it is not in CRAN. Install it from Bioconductor

# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(version = "3.11")

And to use it:

library(rhdf5)

List the objects within the file to find the data group you want to read:

h5ls("path/to/file.h5")

Read the HDF5 data:

mydata <- h5read("path/to/file.h5", "/mygroup/mydata")

And inspect the structure:

str(mydata)

(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.