Pandas standard deviation on one column for subset of rows

Thomas picture Thomas · Aug 16, 2017 · Viewed 13.3k times · Source

I'm new to working with Python and Pandas. Currently I'm attempting to create a report that extracts data from an SQL database and using that data in a pandas dataframe. In each row is a server name and date of sample and then sample data per column following that.

I have been able to filter by the hostname using df[df['hostname'] == uniquehost] df being a variable for the dataframe and uniquehost being a variable for each unique host name.

What I am trying to do next is to obtain the stdev of the other columns although I haven't been capable of figuring this part out. I attempted to use df[df['hostname'] == uniquehost].std()

However, this wasn't correct.

Can anyone point me in the appropriate direction to get this figure out? I suspect I'm barking up the wrong tree and there's likely a very easy way to handle this that I haven't encountered yet.

Hostname | Sample Date | CPU Peak | Memory Peak 
server1 | 08/08/17 | 67.32 | 34.83 
server1 | 08/09/17 | 34 | 62

Answer

cs95 picture cs95 · Aug 16, 2017

IIUC, you'll want to first do df.groupby on Hostname and then find the standard deviation. Something like this:

In [118]: df.groupby('Hostname')[['CPU Peak', 'Memory Peak']].std()
Out[118]: 
           CPU Peak  Memory Peak
Hostname                        
server1   23.560798    19.212091