I know that train_test_split
splits it randomly, but I need to know how to split it based on time.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
# this splits the data randomly as 67% test and 33% train
How to split the same data set based on time as 67% train and 33% test? The dataset has a column TimeStamp.
I tried searching on the similar questions but was not sure about the approach.
Can someone explain briefly?
One easy way to do it..
First: sort the data by time
Second:
import numpy as np
train_set, test_set= np.split(data, [int(.67 *len(data))])
That makes the train_set with the first 67% of the data, and the test_set with rest 33% of the data.