How to split dataframe based on years in python?

amadzebra picture amadzebra · Jun 28, 2018 · Viewed 7.6k times · Source

I have a dataframe called "dataframe" that contains a bunch of information of sales on a certain date. Each date entry is in the format of YYYY-MM-DD, and data ranges from 2012 to 2017. I would like to split this data frame into 6 separate dataframes, one for each year. So for example, the first split dataframe will have all the entries from 2012.

I think I was able to do this in the code below. I split the dataframe into one for each year and put them in the list "years". However, when I try to run auto_arima on each dataframe I get the error "Found input variables with inconsistent numbers of samples."

I think this is because I'm not properly splitting my original dataframe correctly. How do I properly split my dataframe based on year?

#Partition data into years
years = [g for n, g in dataframe.set_index('Date').groupby(pd.Grouper(freq='Y'))]

#Create a list that will hold all auto_arima results for every dataframe
stepwise_models = []

#Call auto_arima on every dataframe
for x in range(len(years)-1):
    currentDf = years[x]
    model = auto_arima(currentDf['price'], exogenous=xreg, start_p=1, start_q=1,
        max_p=3, max_q=3, m=12,
        start_P=0, seasonal=True,
        d=1, D=1, trace=True,
        error_action='ignore',  
        suppress_warnings=True, 
        stepwise=True)
    stepwise_models.append(model) #Store current auto_arima result in our stepwise_models[] list

Answer

min2bro picture min2bro · Jun 28, 2018

You can use datetime accesor to filter the rows by year and create a new dataframe by year

import datetime as dt
dataframe1=dataframe[dataframe['Date'].dt.year == 2012]