I have an issue with subtracting a part of the a date column. My goal is to set up a cohort analysis table. In order to achieve this I would like to get the ordering month of any customer in my pandas dataframe.
OrderDate ConsumerId Orders
0 2017-09-01 5555555 4
1 2017-09-01 66666666666 2
2 2017-09-01 88888888888 4
3 2017-09-01 9999999999 3
4 2017-09-01 45858888 3
I used the code below to do this.
import pandas as pd
pd.set_option('max_columns', 50)
mpl.rcParams['lines.linewidth'] = 2
df = pd.read_csv("C:/Users/***/****/salesdata.csv")
df.head()
df['OrderPeriod'] = df.OrderDate.apply(lambda x: x.strftime('%Y-%m'))
Unfurtantely I get the following error when trying to subtract the ordering month.
df['OrderPeriod'] = df.OrderDate.apply(lambda x: x.strftime('%Y-%m'))
AttributeError: 'str' object has no attribute 'strftime'
My pandas dataframe looks like this:
OrderDate ConsumerId Orders
0 2017-09-01 5555555 4
1 2017-09-01 66666666666 2
2 2017-09-01 88888888888 4
3 2017-09-01 9999999999 3
4 2017-09-01 45858888 3
My goal is to add a new column with the month in which a consumer ordered. Like below.
OrderDate ConsumerId Orders OrderPeriod
0 2017-09-01 5555555 4 2017-09
1 2017-09-01 66666666666 2 2017-09
2 2017-09-01 88888888888 4 2017-09
3 2017-09-01 9999999999 3 2017-09
4 2017-09-01 45858888 3 2017-09
I was wondering why I get an AttributeError and how I could solve this issue.
Use vectorized dt.strftime
, but first convert column to datetime in read_csv
by parameter parse_dates
:
df = pd.read_csv("C:/Users/***/****/salesdata.csv", parse_dates=['OrderDate'])
df['OrderPeriod'] = df.OrderDate.dt.strftime('%Y-%m')
print (df)
OrderDate ConsumerId Orders OrderPeriod
0 2017-09-01 5555555 4 2017-09
1 2017-09-01 66666666666 2 2017-09
2 2017-09-01 88888888888 4 2017-09
3 2017-09-01 9999999999 3 2017-09
4 2017-09-01 45858888 3 2017-09