I'm trying to calculate the percentile of each number within a dataframe and add it to a new column called 'percentile'.
This is my attempt:
import pandas as pd
from scipy import stats
data = {'symbol':'FB','date':['2012-05-18','2012-05-21','2012-05-22','2012-05-23'],'close':[38.23,34.03,31.00,32.00]}
df = pd.DataFrame(data)
close = df['close']
for i in df:
df['percentile'] = stats.percentileofscore(close,df['close'])
The column is not being filled and results in 'NaN'. This should be fairly easy, but I'm not sure where I'm going wrong.
Thanks in advance for the help.
df.close.apply(lambda x: stats.percentileofscore(df.close.sort_values(),x))
or
df.close.rank(pct=True)
Output:
0 1.00
1 0.75
2 0.25
3 0.50
Name: close, dtype: float64