I was going through the documentation about the hierarchical indexing in Pandas. I tried testing the examples from it to create an empty dataframe with hierarchical indexing:
In [5]: df = pd.DataFrame()
In [6]: df.columns = pd.MultiIndex(levels = [['first', 'second'], ['a', 'b']], labels = [[0, 0, 1, 1], [0, 1, 0, 1]])
However, it throws an error:
ValueError Traceback (most recent call last)
<ipython-input-6-dd823f9b8d22> in <module>()
----> 1 df.columns = pd.MultiIndex(levels = [['first', 'second'], ['a', 'b']], labels = [[0, 0, 1, 1], [0, 1, 0, 1]])
/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py in __setattr__(self, name, value)
2755 try:
2756 object.__getattribute__(self, name)
-> 2757 return object.__setattr__(self, name, value)
2758 except AttributeError:
2759 pass
pandas/src/properties.pyx in pandas.lib.AxisProperty.__set__ (pandas/lib.c:44873)()
/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py in _set_axis(self, axis, labels)
446
447 def _set_axis(self, axis, labels):
--> 448 self._data.set_axis(axis, labels)
449 self._clear_item_cache()
450
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in set_axis(self, axis, new_labels)
2800 raise ValueError('Length mismatch: Expected axis has %d elements, '
2801 'new values have %d elements' %
-> 2802 (old_len, new_len))
2803
2804 self.axes[axis] = new_labels
ValueError: Length mismatch: Expected axis has 0 elements, new values have 4 elements
I don't see any problem with my code. Any ideas what is happening?
The problem is that you have an empty data frame which has zero columns, and you are trying to assign a four columns multi-index to it; If you create an empty data frame of four columns initially, the error will be gone:
df = pd.DataFrame(pd.np.empty((0, 4)))
df.columns = pd.MultiIndex(levels = [['first', 'second'], ['a', 'b']], labels = [[0, 0, 1, 1], [0, 1, 0, 1]])
Or you can create empty data frame with the multi-index as follows:
multi_index = pd.MultiIndex(levels = [['first', 'second'], ['a', 'b']], labels = [[0, 0, 1, 1], [0, 1, 0, 1]])
df = pd.DataFrame(columns=multi_index)
df
# first second
# a b a b