genfromtxt error - Got n columns instead of m

Tom Kurushingal picture Tom Kurushingal · Mar 2, 2015 · Viewed 12.7k times · Source

I am trying to import data using numpy's genfromtxt with header names and non-homogeneous data types. Every time I run the program I get the error:

Traceback (most recent call last):
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #8 (got 6 columns instead of 1)
    Line #9 (got 6 columns instead of 1)
    Line #10 (got 6 columns instead of 1)
    Line #11 (got 6 columns instead of 1)
    Line #12 (got 6 columns instead of 1)

I have already gone through this question but it didn't solve my problem. It is a very simple problem, but I can't figure out what is wrong. The code and data is included:

Code

import numpy as np
data = np.genfromtxt('Data.dat', comments='#', delimiter='\t', names=True, dtype=None).transpose()
print data

Tab-separated data

# -----
# -----
# -----
# -----
# -----
# -----
# -----
column_1    column_2    column_3    column_4    column_5    column_6
1   2   3   A   1   F
4   3   2   B   2   G
1   4   3   C   3   H
5   6   4   D   4   I

Update

In short what I require is a way of converting the first valid line after skip_header to be the first uncommented valid line with the optional argument names=True.

Answer

Warren Weckesser picture Warren Weckesser · Mar 2, 2015

When names=True, genfromtxt expects the first line (after skip_header lines) to contain the field names, even if that line is a comment. Apparently it is pretty common for field names to be specified in a comment. If you have a variable number of comments before your uncommented field names, you'll have to work around this quirk of genfromtxt. The following shows one way you could do this.

Here's my test file. (The file is space-delimited. Add delimiter='\t' in the call to genfromtxt for a tab-delimited file).

In [12]: cat with_comments.dat
# Some
# comments
# here
foo bar baz
1.0 2.0 3.0
4.0 5.0 6.0
7.0 8.0 9.0

Open the file, and read lines until the line is not a comment:

In [13]: f = open("with_comments.dat", "r")

In [14]: line = f.readline()

In [15]: while line.startswith('#'):
   ....:     line = f.readline()
   ....: 

line now holds the line of field names:

In [16]: line
Out[16]: 'foo bar baz\n'

Convert it to a list of names:

In [17]: names = line.split()

Give those names to genfromtxt, and read the rest of the file:

In [18]: data = genfromtxt(f, names=names)

In [19]: data
Out[19]: 
array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0), (7.0, 8.0, 9.0)], 
      dtype=[('foo', '<f8'), ('bar', '<f8'), ('baz', '<f8')])

Don't forget to close the file (or better, use with("with_comments.dat", "r") as f: instead):

In [20]: f.close()