I have a file in hdf5
format. I know that it is supposed to be a matrix, but I want to read that matrix in R
so that I can study it. I see that there is a h5r
package that is supposed to help with this, but I do not see any simple to read/understand tutorial. Is such a tutorial available online. Specifically, How do you read a hdf5
object with this package, and how to actually extract the matrix?
UPDATE
I found out a package rhdf5
which is not part of CRAN but is part of BioConductoR. The interface is relatively easier to understand the the documentation and example code is quite clear. I could use it without problems. My problem it seems was the input file. The matrix that I wanted to read was actually stored in the hdf5
file as a python pickle
. So every time I tried to open it and access it through R
i got a segmentation fault
. I did figure out how to save the matrix from within python
as a tsv
file and now that problem is solved.
The rhdf5
package works really well, although it is not in CRAN. Install it from Bioconductor
# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.11")
And to use it:
library(rhdf5)
List the objects within the file to find the data group you want to read:
h5ls("path/to/file.h5")
Read the HDF5 data:
mydata <- h5read("path/to/file.h5", "/mygroup/mydata")
str(mydata)
(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.