In most of the Scikit-learn algorithms, the data must be loaded as a Bunch object. For many example in the tutorial load_files() or other functions are used to populate the Bunch object. Functions like load_files() expect data to be present in certain format, but I have data stored in a different format, namely a CSV file with strings for each field.
How do I parse this and load data in the Bunch object format?
You can do it like this:
import numpy as np
import sklearn.datasets
examples = []
examples.append('some text')
examples.append('another example text')
examples.append('example 3')
target = np.zeros((3,), dtype=np.int64)
target[0] = 0
target[1] = 1
target[2] = 0
dataset = sklearn.datasets.base.Bunch(data=examples, target=target)