"Got 1 columns instead of ..." error in numpy

user3466132 picture user3466132 · Apr 29, 2014 · Viewed 49.2k times · Source

I'm working on the following code for performing Random Forest Classification on train and test sets;

from sklearn.ensemble import RandomForestClassifier
from numpy import genfromtxt, savetxt

def main():
    dataset = genfromtxt(open('filepath','r'), delimiter=' ', dtype='f8')   
    target = [x[0] for x in dataset]
    train = [x[1:] for x in dataset]
    test = genfromtxt(open('filepath','r'), delimiter=' ', dtype='f8')

    rf = RandomForestClassifier(n_estimators=100)
    rf.fit(train, target)
    predicted_probs = [[index + 1, x[1]] for index, x in enumerate(rf.predict_proba(test))]

    savetxt('filepath', predicted_probs, delimiter=',', fmt='%d,%f', 
            header='Id,PredictedProbability', comments = '')

if __name__=="__main__":
    main()

However I get the following error on execution;

---->      dataset = genfromtxt(open('C:/Users/user/Desktop/pgm/Cora/a_train.csv','r'), delimiter='', dtype='f8')

ValueError: Some errors were detected !
    Line #88 (got 1435 columns instead of 1434)
    Line #93 (got 1435 columns instead of 1434)
    Line #164 (got 1435 columns instead of 1434)
    Line #169 (got 1435 columns instead of 1434)
    Line #524 (got 1435 columns instead of 1434)
...
...
...

Any suggestions as to how avoid it?? Thanks.

Answer

atomh33ls picture atomh33ls · Apr 29, 2014

genfromtxt will give this error if the number of columns is unequal.

I can think of 3 ways around it:

1. Use the usecols parameter

np.genfromtxt('yourfile.txt',delimiter=',',usecols=np.arange(0,1434))

However - this may mean that you lose some data (where rows are longer than 1434 columns) - whether or not that matters is down to you.

2. Adjust your input data file so that it has an equal number of columns.

3. Use something other than genfromtxt:

.............like this