Creating a structured array from a list

Jérôme picture Jérôme · Apr 24, 2017 · Viewed 7.3k times · Source

I have a simple list of elements and I'm trying to make a structured array out of it.

This naive approach fails:

y = np.array([1,2,3], dtype=[('y', float)])
TypeError: expected an object with a buffer interface

Putting each element in a tuple works:

# Manuel way
y = np.array([(1,), (2,), (3,)], dtype=[('y', float)])
# Comprehension
y = np.array([tuple((x,)) for x in [1,2,3]], dtype=[('y', float)])

It also works if I create an array from the list first:

y = np.array(np.array([1,2,3]), dtype=[('y', float)])

I'm a bit puzzled. How come the latter works but numpy couldn't sort things out when provided a simple list?

What is the recommended way? Creating that intermediate array might not have a great performance impact, but isn't this suboptimal?

I'm also surprised that those won't work:

# All lists
y = np.array([[1,], [2,], [3,]], dtype=[('y', float)])
TypeError: expected an object with a buffer interface
# All tuples
y = np.array(((1,), (2,), (3,)), dtype=[('y', float)])
ValueError: size of tuple must match number of fields.

I'm new to structured arrays and I don't remember numpy being that picky about input types. There must be something I'm missing.

Answer

hpaulj picture hpaulj · Apr 24, 2017

Details of how np.array handles various inputs are buried in compiled code. As the many questions about creating object dtype arrays show, it can be complicated and confusing. The basic model is to create multidimensional numeric array from a nested list.

np.array([[1,2,3],[4,5,6]])

In implementing structured arrays, developers adopted the tuple as a way of distinguishing a record from just another nested dimension. That is evident in the display of a structured array.

It is also a requirement when defining a structured array, though the list of tuples requirement is somewhat buried in the documentation.

In [382]: dt=np.dtype([('y',int)])
In [383]: np.array(alist,dt)

TypeError: a bytes-like object is required, not 'int'

This is my version '1.12.0' error message. It appears to be different in yours.

As you note a list comprehension can convert the nest list into a list of tuples.

In [384]: np.array([tuple(i) for i in alist],dt)
Out[384]: 
array([(1,), (2,), (3,)], 
      dtype=[('y', '<i4')])

In answering SO questions that's the approach I use most often. Either that or iteratively set fields of a preallocated array (usually there are a lot more records than fields, so that loop is not expensive).

It looks like wrapping the array in an structured array call is equivalent to an astype call:

In [385]: np.array(np.array(alist),dt)
Out[385]: 
array([[(1,)],
       [(2,)],
       [(3,)]], 
      dtype=[('y', '<i4')])
In [386]: np.array(alist).astype(dt)
Out[386]: 
array([[(1,)],
       [(2,)],
       [(3,)]], 
      dtype=[('y', '<i4')])

But note the change in the number of dimensions. The list of tuples created a (3,) array. The astype converted a (3,1) numeric array into a (3,1) structured array.

Part of what the tuples tell np.array is - put the division between array dimensions and records 'here'. It interprets

[(3,), (1,), (2,)]
[record, record, record]

where as automatic translation of [[1],[2],[3]] might produce

[[record],[record],[record]]

When the dtype is numeric (non-structured) it ignores the distinction between list and tuple

In [388]: np.array([tuple(i) for i in alist],int)
Out[388]: 
array([[1],
       [2],
       [3]])

But when the dtype is compound, developers have chosen to use the tuple layer as significant information.


Consider a more complex structured dtype

In [389]: dt1=np.dtype([('y',int,(2,))])
In [390]: np.ones((3,), dt1)
Out[390]: 
array([([1, 1],), ([1, 1],), ([1, 1],)], 
      dtype=[('y', '<i4', (2,))])
In [391]: np.array([([1,2],),([3,4],)])
Out[391]: 
array([[[1, 2]],

       [[3, 4]]])
In [392]: np.array([([1,2],),([3,4],)], dtype=dt1)
Out[392]: 
array([([1, 2],), ([3, 4],)], 
      dtype=[('y', '<i4', (2,))])

The display (and input) has lists within tuples within list. And that's just the start

In [393]: dt1=np.dtype([('x',dt,(2,))])
In [394]: dt1
Out[394]: dtype([('x', [('y', '<i4')], (2,))])
In [395]: np.ones((2,),dt1)
Out[395]: 
array([([(1,), (1,)],), ([(1,), (1,)],)], 
      dtype=[('x', [('y', '<i4')], (2,))])

convert list of tuples to structured numpy array