How do I create test and train samples from one dataframe with pandas?

tooty44 picture tooty44 · Jun 10, 2014 · Viewed 435.9k times · Source

I have a fairly large dataset in the form of a dataframe and I was wondering how I would be able to split the dataframe into two random samples (80% and 20%) for training and testing.

Thanks!

Answer

gobrewers14 picture gobrewers14 · Jun 11, 2014

scikit learn's train_test_split is a good one.

from sklearn.model_selection import train_test_split

train, test = train_test_split(df, test_size=0.2)