I'm new to working with Python and Pandas. Currently I'm attempting to create a report that extracts data from an SQL database and using that data in a pandas dataframe. In each row is a server name and date of sample and then sample data per column following that.
I have been able to filter by the hostname using df[df['hostname'] == uniquehost] df being a variable for the dataframe and uniquehost being a variable for each unique host name.
What I am trying to do next is to obtain the stdev of the other columns although I haven't been capable of figuring this part out. I attempted to use df[df['hostname'] == uniquehost].std()
However, this wasn't correct.
Can anyone point me in the appropriate direction to get this figure out? I suspect I'm barking up the wrong tree and there's likely a very easy way to handle this that I haven't encountered yet.
Hostname | Sample Date | CPU Peak | Memory Peak
server1 | 08/08/17 | 67.32 | 34.83
server1 | 08/09/17 | 34 | 62
IIUC, you'll want to first do df.groupby
on Hostname
and then find the standard deviation. Something like this:
In [118]: df.groupby('Hostname')[['CPU Peak', 'Memory Peak']].std()
Out[118]:
CPU Peak Memory Peak
Hostname
server1 23.560798 19.212091