Add a row with means of columns to pandas DataFrame

willk picture willk · Mar 17, 2017 · Viewed 16.5k times · Source

I have a pandas DataFrame consisting of some sensor readings taken over time like this:

       diode1  diode2  diode3  diode4
Time
0.530       7       0      10      16
1.218      17       7      14      19
1.895      13       8      16      17
2.570       8       2      16      17
3.240      14       8      17      19
3.910      13       6      17      18
4.594      13       5      16      19
5.265       9       0      12      16
5.948      12       3      16      17
6.632      10       2      15      17

I have written code to add another row with the means of each column:

# List of the averages for the test. 
averages = [df[key].describe()['mean'] for key in df]
indexes = df.index.tolist()
indexes.append('mean')
df.reindex(indexes)
# Adding the mean row to the bottom of the DataFrame

i = 0
for key in df:
    df.set_value('mean', key, averages[i])
    i += 1

This gives me the result I want, which is a DataFrame like this:

       diode1  diode2  diode3  diode4
Time
0.53      7.0     0.0    10.0    16.0
1.218    17.0     7.0    14.0    19.0
1.895    13.0     8.0    16.0    17.0
2.57      8.0     2.0    16.0    17.0
3.24     14.0     8.0    17.0    19.0
3.91     13.0     6.0    17.0    18.0
4.594    13.0     5.0    16.0    19.0
5.265     9.0     0.0    12.0    16.0
5.948    12.0     3.0    16.0    17.0
6.632    10.0     2.0    15.0    17.0
mean     11.6     4.1    14.9    17.5

However, I am sure that this is not the most efficient way of adding the row. I have tried using append with the means saved as a pandas Series but ended up with something like this:

    diode1  diode2  diode3  diode4                     mean
0      7.0     0.0    10.0    14.0                      NaN
1      9.0     0.0    10.0    15.0                      NaN
2     10.0     5.0    14.0    20.0                      NaN
3      6.0     0.0     7.0    14.0                      NaN
4      7.0     0.0    10.0    15.0                      NaN
5      7.0     0.0     8.0    14.0                      NaN
6      7.0     0.0    11.0    14.0                      NaN
7      7.0     0.0     2.0    11.0                      NaN
8      2.0     0.0     4.0    12.0                      NaN
9      4.0     0.0     0.0     6.0                      NaN
10     NaN     NaN     NaN     NaN  [11.6, 4.1, 14.9, 17.5]

I was wondering if there was a more efficient means of adding a row with the index 'mean' and the averages of each column to the bottom of a pandas DataFrame.

Answer

root picture root · Mar 17, 2017

Use loc for setting with enlargement:

df.loc['mean'] = df.mean()

The resulting output:

       diode1  diode2  diode3  diode4
Time                                 
0.53      7.0     0.0    10.0    16.0
1.218    17.0     7.0    14.0    19.0
1.895    13.0     8.0    16.0    17.0
2.57      8.0     2.0    16.0    17.0
3.24     14.0     8.0    17.0    19.0
3.91     13.0     6.0    17.0    18.0
4.594    13.0     5.0    16.0    19.0
5.265     9.0     0.0    12.0    16.0
5.948    12.0     3.0    16.0    17.0
6.632    10.0     2.0    15.0    17.0
mean     11.6     4.1    14.9    17.5