I have a dataframe called "dataframe" that contains a bunch of information of sales on a certain date. Each date entry is in the format of YYYY-MM-DD, and data ranges from 2012 to 2017. I would like to split this data frame into 6 separate dataframes, one for each year. So for example, the first split dataframe will have all the entries from 2012.
I think I was able to do this in the code below. I split the dataframe into one for each year and put them in the list "years". However, when I try to run auto_arima on each dataframe I get the error "Found input variables with inconsistent numbers of samples."
I think this is because I'm not properly splitting my original dataframe correctly. How do I properly split my dataframe based on year?
#Partition data into years
years = [g for n, g in dataframe.set_index('Date').groupby(pd.Grouper(freq='Y'))]
#Create a list that will hold all auto_arima results for every dataframe
stepwise_models = []
#Call auto_arima on every dataframe
for x in range(len(years)-1):
currentDf = years[x]
model = auto_arima(currentDf['price'], exogenous=xreg, start_p=1, start_q=1,
max_p=3, max_q=3, m=12,
start_P=0, seasonal=True,
d=1, D=1, trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
stepwise_models.append(model) #Store current auto_arima result in our stepwise_models[] list
You can use datetime accesor to filter the rows by year and create a new dataframe by year
import datetime as dt
dataframe1=dataframe[dataframe['Date'].dt.year == 2012]