I have a simple list of elements and I'm trying to make a structured array out of it.
This naive approach fails:
y = np.array([1,2,3], dtype=[('y', float)])
TypeError: expected an object with a buffer interface
Putting each element in a tuple works:
# Manuel way
y = np.array([(1,), (2,), (3,)], dtype=[('y', float)])
# Comprehension
y = np.array([tuple((x,)) for x in [1,2,3]], dtype=[('y', float)])
It also works if I create an array from the list first:
y = np.array(np.array([1,2,3]), dtype=[('y', float)])
I'm a bit puzzled. How come the latter works but numpy
couldn't sort things out when provided a simple list?
What is the recommended way? Creating that intermediate array
might not have a great performance impact, but isn't this suboptimal?
I'm also surprised that those won't work:
# All lists
y = np.array([[1,], [2,], [3,]], dtype=[('y', float)])
TypeError: expected an object with a buffer interface
# All tuples
y = np.array(((1,), (2,), (3,)), dtype=[('y', float)])
ValueError: size of tuple must match number of fields.
I'm new to structured arrays and I don't remember numpy
being that picky about input types. There must be something I'm missing.
Details of how np.array
handles various inputs are buried in compiled code. As the many questions about creating object dtype arrays show, it can be complicated and confusing. The basic model is to create multidimensional numeric array from a nested list.
np.array([[1,2,3],[4,5,6]])
In implementing structured arrays, developers adopted the tuple
as a way of distinguishing a record from just another nested dimension. That is evident in the display of a structured array.
It is also a requirement when defining a structured array, though the list of tuples
requirement is somewhat buried in the documentation.
In [382]: dt=np.dtype([('y',int)])
In [383]: np.array(alist,dt)
TypeError: a bytes-like object is required, not 'int'
This is my version '1.12.0' error message. It appears to be different in yours.
As you note a list comprehension can convert the nest list into a list of tuples.
In [384]: np.array([tuple(i) for i in alist],dt)
Out[384]:
array([(1,), (2,), (3,)],
dtype=[('y', '<i4')])
In answering SO questions that's the approach I use most often. Either that or iteratively set fields of a preallocated array (usually there are a lot more records than fields, so that loop is not expensive).
It looks like wrapping the array in an structured array call is equivalent to an astype
call:
In [385]: np.array(np.array(alist),dt)
Out[385]:
array([[(1,)],
[(2,)],
[(3,)]],
dtype=[('y', '<i4')])
In [386]: np.array(alist).astype(dt)
Out[386]:
array([[(1,)],
[(2,)],
[(3,)]],
dtype=[('y', '<i4')])
But note the change in the number of dimensions. The list of tuples created a (3,) array. The astype
converted a (3,1)
numeric array into a (3,1) structured array.
Part of what the tuples tell np.array
is - put the division between array dimensions and records 'here'. It interprets
[(3,), (1,), (2,)]
[record, record, record]
where as automatic translation of [[1],[2],[3]]
might produce
[[record],[record],[record]]
When the dtype is numeric (non-structured) it ignores the distinction between list and tuple
In [388]: np.array([tuple(i) for i in alist],int)
Out[388]:
array([[1],
[2],
[3]])
But when the dtype is compound, developers have chosen to use the tuple layer as significant information.
Consider a more complex structured dtype
In [389]: dt1=np.dtype([('y',int,(2,))])
In [390]: np.ones((3,), dt1)
Out[390]:
array([([1, 1],), ([1, 1],), ([1, 1],)],
dtype=[('y', '<i4', (2,))])
In [391]: np.array([([1,2],),([3,4],)])
Out[391]:
array([[[1, 2]],
[[3, 4]]])
In [392]: np.array([([1,2],),([3,4],)], dtype=dt1)
Out[392]:
array([([1, 2],), ([3, 4],)],
dtype=[('y', '<i4', (2,))])
The display (and input) has lists within tuples within list. And that's just the start
In [393]: dt1=np.dtype([('x',dt,(2,))])
In [394]: dt1
Out[394]: dtype([('x', [('y', '<i4')], (2,))])
In [395]: np.ones((2,),dt1)
Out[395]:
array([([(1,), (1,)],), ([(1,), (1,)],)],
dtype=[('x', [('y', '<i4')], (2,))])