I'm trying to figure out if there is a good way to manage units in my pandas data. For example, I have a DataFrame
that looks like this:
length (m) width (m) thickness (cm)
0 1.2 3.4 5.6
1 7.8 9.0 1.2
2 3.4 5.6 7.8
Currently, the measurement units are encoded in column names. Downsides include:
df['width (m)']
vs. df['width']
If I wanted to strip the units out of the column names, is there somewhere else that the information could be stored?
There isn't any great way to do this right now, see github issue here for some discussion.
As a quick hack, could do something like this, maintaining a separate dict with the units.
In [3]: units = {}
In [5]: newcols = []
...: for col in df:
...: name, unit = col.split(' ')
...: units[name] = unit
...: newcols.append(name)
In [6]: df.columns = newcols
In [7]: df
Out[7]:
length width thickness
0 1.2 3.4 5.6
1 7.8 9.0 1.2
2 3.4 5.6 7.8
In [8]: units['length']
Out[8]: '(m)'