How to write a Pandas Dataframe into a HDF5 dataset

AleVis picture AleVis · Nov 7, 2017 · Viewed 16.2k times · Source

I'm trying to write data from a Pandas dataframe into a nested hdf5 file, with multiple groups and datasets within each group. I'd like to keep it as a single file which will grow in the future on a daily basis. I've had a go with the following code, which shows the structure of what I'd like to achieve

import h5py
import numpy as np
import pandas as pd

file = h5py.File('database.h5','w')

d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d) 
        
groups = ['A','B','C']         
        
for m in groups:
    
    group = file.create_group(m)
    dataset = ['1','2','3']

    for n in dataset:
    
        data = df
        ds = group.create_dataset(m + n, data.shape)
        print ("Dataset dataspace is", ds.shape)
        print ("Dataset Numpy datatype is", ds.dtype)
        print ("Dataset name is", ds.name)
        print ("Dataset is a member of the group", ds.parent)
        print ("Dataset was created in the file", ds.file)
                        
        print ("Writing data...")
        ds[...] = data        
     
        print ("Reading data back...")
        data_read = ds[...]
            
        print ("Printing data...")
        print (data_read)

file.close()

This way the nested structure is created but it loses the index and columns. I've tried the

df.to_hdf('database.h5', ds, table=True, mode='a')

but didn't work, I get this error

AttributeError: 'Dataset' object has no attribute 'split'

Can anyone shed some light please. Many thanks

Answer

MaxU picture MaxU · Nov 7, 2017

df.to_hdf() expects a string as a key parameter (second parameter):

key : string

identifier for the group in the store

so try this:

df.to_hdf('database.h5', ds.name, table=True, mode='a')

where ds.name should return you a string (key name):

In [26]: ds.name
Out[26]: '/A1'