How can I split a Dataset from a .csv file for Training and Testing?

Midi picture Midi · Apr 29, 2017 · Viewed 19.2k times · Source

I'm using Python and I need to split my .csv imported data in two parts, a training and test set, E.G 70% training and 30% test.

I keep getting various errors, such as 'list' object is not callable and so on.

Is there any easy way of doing this?

Thanks

EDIT:

The code is basic, I'm just looking to split the dataset.

from csv import reader
with open('C:/Dataset.csv', 'r') as f:
    data = list(reader(f)) #Imports the CSV
    data[0:1] ( data )

TypeError: 'list' object is not callable

Answer

zipa picture zipa · Apr 29, 2017

You can use pandas:

import pandas as pd
import numpy as np

df = pd.read_csv('C:/Dataset.csv')
df['split'] = np.random.randn(df.shape[0], 1)

msk = np.random.rand(len(df)) <= 0.7

train = df[msk]
test = df[~msk]