ValueError: You must specify a freq or x must be a pandas object with a timeseries index

Rocketq picture Rocketq · Sep 1, 2016 · Viewed 22.7k times · Source

I have two numpy arrays light_points and time_points and would like to use some time series analysis methods on those data.

I then tried this :

import statsmodels.api as sm
import pandas as pd
tdf = pd.DataFrame({'time':time_points[:]})
rdf =  pd.DataFrame({'light':light_points[:]})
rdf.index = pd.DatetimeIndex(freq='w',start=0,periods=len(rdf.light))
#rdf.index = pd.DatetimeIndex(tdf['time'])

This works but is not doing the correct thing. Indeed, the measurements are not evenly time-spaced and if I just declare the time_points pandas DataFrame as the index of my frame, I get an error :

rdf.index = pd.DatetimeIndex(tdf['time'])

decomp = sm.tsa.seasonal_decompose(rdf)

elif freq is None:
raise ValueError("You must specify a freq or x must be a pandas object with a timeseries index")

ValueError: You must specify a freq or x must be a pandas object with a timeseries index

I don't know how to correct this. Also, it seems that pandas' TimeSeries are deprecated.

I tried this :

rdf = pd.Series({'light':light_points[:]})
rdf.index = pd.DatetimeIndex(tdf['time'])

But it gives me a length mismatch :

ValueError: Length mismatch: Expected axis has 1 elements, new values have 122 elements

Nevertheless, I don't understand where it comes from, as rdf['light'] and tdf['time'] are of same length...

Eventually, I tried by defining my rdf as a pandas Series :

rdf = pd.Series(light_points[:],index=pd.DatetimeIndex(time_points[:]))

And I get this :

ValueError: You must specify a freq or x must be a pandas object with a timeseries index

Then, I tried instead replacing the index by

 pd.TimeSeries(time_points[:])

And it gives me an error on the seasonal_decompose method line :

AttributeError: 'Float64Index' object has no attribute 'inferred_freq'

How can I work with unevenly spaced data ? I was thinking about creating an approximately evenly spaced time array by adding many unknown values between the existing values and using interpolation to "evaluate" those points, but I think there could be a cleaner and easier solution.

Answer