Confidence Interval in Python dataframe

Question 1

Confidence Interval in Python dataframe

python pandas confidence-interval

MasterShifu · Nov 28, 2018 · Viewed 17.4k times · Source

Answer

Answer

import pandas as pd
import numpy as np
import math

df=pd.DataFrame({'Class': ['A1','A1','A1','A2','A3','A3'], 
                 'Force': [50,150,100,120,140,160] },
                 columns=['Class', 'Force'])
print(df)
print('-'*30)

stats = df.groupby(['Class'])['Force'].agg(['mean', 'count', 'std'])
print(stats)
print('-'*30)

ci95_hi = []
ci95_lo = []

for i in stats.index:
    m, c, s = stats.loc[i]
    ci95_hi.append(m + 1.95*s/math.sqrt(c))
    ci95_lo.append(m - 1.95*s/math.sqrt(c))

stats['ci95_hi'] = ci95_hi
stats['ci95_lo'] = ci95_lo
print(stats)

The output is

  Class  Force
0    A1     50
1    A1    150
2    A1    100
3    A2    120
4    A3    140
5    A3    160
------------------------------
       mean  count        std
Class                        
A1      100      3  50.000000
A2      120      1        NaN
A3      150      2  14.142136
------------------------------
       mean  count        std     ci95_hi     ci95_lo
Class                                                
A1      100      3  50.000000  156.291651   43.708349
A2      120      1        NaN         NaN         NaN
A3      150      2  14.142136  169.500000  130.500000

Question 2

I am trying to calculate the mean and confidence interval(95%) of a column "Force" in a large dataset. I need the result by using the groupby function by grouping different "Classes".

When I calculate the mean and put it in the new dataframe, it gives me NaN values for all rows. I'm not sure if I'm going the correct way. Is there any easier way to do this?

This is the sample dataframe:

df=pd.DataFrame({ 'Class': ['A1','A1','A1','A2','A3','A3'], 
                  'Force': [50,150,100,120,140,160] },
                   columns=['Class', 'Force'])

To calculate the confidence interval, the first step I did was to calculate the mean. This is what I used:

F1_Mean = df.groupby(['Class'])['Force'].mean()

This gave me NaN values for all rows.

Confidence Interval in Python dataframe

Answer

Related questions