I have two numpy arrays light_points and time_points and would like to use some time series analysis methods on those data.
I then tried this :
import statsmodels.api as sm
import pandas as pd
tdf = pd.DataFrame({'time':time_points[:]})
rdf = pd.DataFrame({'light':light_points[:]})
rdf.index = pd.DatetimeIndex(freq='w',start=0,periods=len(rdf.light))
#rdf.index = pd.DatetimeIndex(tdf['time'])
This works but is not doing the correct thing. Indeed, the measurements are not evenly time-spaced and if I just declare the time_points pandas DataFrame as the index of my frame, I get an error :
rdf.index = pd.DatetimeIndex(tdf['time'])
decomp = sm.tsa.seasonal_decompose(rdf)
elif freq is None:
raise ValueError("You must specify a freq or x must be a pandas object with a timeseries index")
ValueError: You must specify a freq or x must be a pandas object with a timeseries index
I don't know how to correct this.
Also, it seems that pandas' TimeSeries
are deprecated.
I tried this :
rdf = pd.Series({'light':light_points[:]})
rdf.index = pd.DatetimeIndex(tdf['time'])
But it gives me a length mismatch :
ValueError: Length mismatch: Expected axis has 1 elements, new values have 122 elements
Nevertheless, I don't understand where it comes from, as rdf['light'] and tdf['time'] are of same length...
Eventually, I tried by defining my rdf as a pandas Series :
rdf = pd.Series(light_points[:],index=pd.DatetimeIndex(time_points[:]))
And I get this :
ValueError: You must specify a freq or x must be a pandas object with a timeseries index
Then, I tried instead replacing the index by
pd.TimeSeries(time_points[:])
And it gives me an error on the seasonal_decompose method line :
AttributeError: 'Float64Index' object has no attribute 'inferred_freq'
How can I work with unevenly spaced data ? I was thinking about creating an approximately evenly spaced time array by adding many unknown values between the existing values and using interpolation to "evaluate" those points, but I think there could be a cleaner and easier solution.