How to select and delete columns with duplicate name in pandas DataFrame

user3107640 picture user3107640 · Dec 16, 2013 · Viewed 18.4k times · Source

I have a huge DataFrame, where some columns have the same names. When I try to pick a column that exists twice, (eg del df['col name'] or df2=df['col name']) I get an error. What can I do?

Answer

Roman Pekar picture Roman Pekar · Dec 16, 2013

You can adress columns by index:

>>> df = pd.DataFrame([[1,2],[3,4],[5,6]], columns=['a','a'])
>>> df
   a  a
0  1  2
1  3  4
2  5  6
>>> df.iloc[:,0]
0    1
1    3
2    5

Or you can rename columns, like

>>> df.columns = ['a','b']
>>> df
   a  b
0  1  2
1  3  4
2  5  6