I am trying to import data using numpy's genfromtxt with header names and non-homogeneous data types. Every time I run the program I get the error:
Traceback (most recent call last):
raise ValueError(errmsg)
ValueError: Some errors were detected !
Line #8 (got 6 columns instead of 1)
Line #9 (got 6 columns instead of 1)
Line #10 (got 6 columns instead of 1)
Line #11 (got 6 columns instead of 1)
Line #12 (got 6 columns instead of 1)
I have already gone through this question but it didn't solve my problem. It is a very simple problem, but I can't figure out what is wrong. The code and data is included:
Code
import numpy as np
data = np.genfromtxt('Data.dat', comments='#', delimiter='\t', names=True, dtype=None).transpose()
print data
Tab-separated data
# -----
# -----
# -----
# -----
# -----
# -----
# -----
column_1 column_2 column_3 column_4 column_5 column_6
1 2 3 A 1 F
4 3 2 B 2 G
1 4 3 C 3 H
5 6 4 D 4 I
Update
In short what I require is a way of converting the first valid line after skip_header to be the first uncommented valid line with the optional argument names=True.
When names=True
, genfromtxt
expects the first line (after skip_header
lines) to contain the field names, even if that line is a comment. Apparently it is pretty common for field names to be specified in a comment. If you have a variable number of comments before your uncommented field names, you'll have to work around this quirk of genfromtxt
. The following shows one way you could do this.
Here's my test file. (The file is space-delimited. Add delimiter='\t'
in the call to genfromtxt
for a tab-delimited file).
In [12]: cat with_comments.dat
# Some
# comments
# here
foo bar baz
1.0 2.0 3.0
4.0 5.0 6.0
7.0 8.0 9.0
Open the file, and read lines until the line is not a comment:
In [13]: f = open("with_comments.dat", "r")
In [14]: line = f.readline()
In [15]: while line.startswith('#'):
....: line = f.readline()
....:
line
now holds the line of field names:
In [16]: line
Out[16]: 'foo bar baz\n'
Convert it to a list of names:
In [17]: names = line.split()
Give those names to genfromtxt, and read the rest of the file:
In [18]: data = genfromtxt(f, names=names)
In [19]: data
Out[19]:
array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0), (7.0, 8.0, 9.0)],
dtype=[('foo', '<f8'), ('bar', '<f8'), ('baz', '<f8')])
Don't forget to close the file (or better, use with("with_comments.dat", "r") as f:
instead):
In [20]: f.close()