Get column names (headers) from hdf file

Cenoc picture Cenoc · Aug 26, 2014 · Viewed 7.5k times · Source

I was wondering how to get the column names (seemingly stored in the hdf header) of an hdf file; for example, a file might have columns named [a,b,c,d] while another file has columns [a,b,c] and yet another has columns [b,e,r,z]; and I would like to find out which ones have which. Any help would be very much appreciated!

Answer

ssnobody picture ssnobody · Aug 26, 2014

To do this outside of python you can use h5dump via something like h5dump --header my.hdf5

In python you can use h5py

As an example this is how I might access field names for my HDF-EOS5 file:

>>> import h5py
>>> f = h5py.File('/tmp/temp.hdf','r')
>>> f.keys()
[u'HDFEOS', u'HDFEOS INFORMATION']
>>> f.values()
[<HDF5 group "/HDFEOS" (2 members)>, <HDF5 group "/HDFEOS INFORMATION" (2 members)>]
>>> grpname = f.require_group('/HDFEOS')
>>> grpname.keys()
[u'ADDITIONAL', u'GRIDS']
>>> grpname.values()
[<HDF5 group "/HDFEOS/ADDITIONAL" (1 members)>, <HDF5 group "/HDFEOS/GRIDS" (9 members)>]
>>> subgrpname = grpname.require_group('/HDFEOS/GRIDS')
>>> subgrpname.keys()
[u'355nm_band', u'380nm_band', u'445nm_band', u'470nm_band', u'555nm_band', u'660nm_band', u'865nm_band', u'935nm_band', u'Ancillary']
>>> group_660 = subgrpname.require_group('660nm_band')
>>> group_660.keys()
[u'Data Fields']
>>> group_660.values()
[<HDF5 group "/HDFEOS/GRIDS/660nm_band/Data Fields" (20 members)>]
>>> fields_660 = group_660.require_group('Data Fields')
>>> fields_660.keys()
[u'AOLP_meridian', u'AOLP_scatter', u'DOLP', u'Glint_angle', u'I', u'I.mask', u'IPOL', u'Q.mask', u'Q_meridian', u'Q_scatter', u'RDQI', u'Scattering_angle', u'Sun_azimuth', u'Sun_zenith', u'Time_in_seconds_from_epoch', u'U.mask', u'U_meridian', u'U_scatter', u'View_azimuth', u'View_zenith']
>>> fields_660.values()
[<HDF5 dataset "AOLP_meridian": shape (3072, 3072), type "<f4">, <HDF5 dataset "AOLP_scatter": shape (3072, 3072), type "<f4">, <HDF5 dataset "DOLP": shape (3072, 3072), type "<f4">, <HDF5 dataset "Glint_angle": shape (3072, 3072), type "<f4">, <HDF5 dataset "I": shape (3072, 3072), type "<f4">, <HDF5 dataset "I.mask": shape (3072, 3072), type "<i4">, <HDF5 dataset "IPOL": shape (3072, 3072), type "<f4">, <HDF5 dataset "Q.mask": shape (3072, 3072), type "<i4">, <HDF5 dataset "Q_meridian": shape (3072, 3072), type "<f4">, <HDF5 dataset "Q_scatter": shape (3072, 3072), type "<f4">, <HDF5 dataset "RDQI": shape (3072, 3072), type "<f4">, <HDF5 dataset "Scattering_angle": shape (3072, 3072), type "<f4">, <HDF5 dataset "Sun_azimuth": shape (3072, 3072), type "<f4">, <HDF5 dataset "Sun_zenith": shape (3072, 3072), type "<f4">, <HDF5 dataset "Time_in_seconds_from_epoch": shape (3072, 3072), type "<f8">, <HDF5 dataset "U.mask": shape (3072, 3072), type "<i4">, <HDF5 dataset "U_meridian": shape (3072, 3072), type "<f4">, <HDF5 dataset "U_scatter": shape (3072, 3072), type "<f4">, <HDF5 dataset "View_azimuth": shape (3072, 3072), type "<f4">, <HDF5 dataset "View_zenith": shape (3072, 3072), type "<f4">]