numpy: How to add a column to an existing structured array?

code base 5000 picture code base 5000 · Aug 21, 2014 · Viewed 8.9k times · Source

I have a starting array such as:

[(1, [-112.01268501699997, 40.64249414272372])
 (2, [-111.86145708699996, 40.4945008710162])]

The first column is an int and the second is a list of floats. I need to add a str column called 'USNG'.

I then create a structured numpy array, as such:

dtype = numpy.dtype([('USNG', '|S100')])
x = numpy.empty(array.shape, dtype=dtype)

I want to append the x numpy array to the existing array as a new column, so I can output some information to that column for each row.

When I do the following:

numpy.append(array, x, axis=1)

I get the following error:

'TypeError: invalid type promotion'

I've also tried vstack and hstack

Answer

Warren Weckesser picture Warren Weckesser · Aug 21, 2014

You have to create a new dtype that contains the new field.

For example, here's a:

In [86]: a
Out[86]: 
array([(1, [-112.01268501699997, 40.64249414272372]),
       (2, [-111.86145708699996, 40.4945008710162])], 
      dtype=[('i', '<i8'), ('loc', '<f8', (2,))])

a.dtype.descr is [('i', '<i8'), ('loc', '<f8', (2,))]; i.e. a list of field types. We'll create a new dtype by adding ('USNG', 'S100') to the end of that list:

In [87]: new_dt = np.dtype(a.dtype.descr + [('USNG', 'S100')])

Now create a new structured array, b. I used zeros here, so the string fields will start out with the value ''. You could also use empty. The strings will then contain garbage, but that won't matter if you immediately assign values to them.

In [88]: b = np.zeros(a.shape, dtype=new_dt)

Copy over the existing data from a to b:

In [89]: b['i'] = a['i']

In [90]: b['loc'] = a['loc']

Here's b now:

In [91]: b
Out[91]: 
array([(1, [-112.01268501699997, 40.64249414272372], ''),
       (2, [-111.86145708699996, 40.4945008710162], '')], 
      dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])

Fill in the new field with some data:

In [93]: b['USNG'] = ['FOO', 'BAR']

In [94]: b
Out[94]: 
array([(1, [-112.01268501699997, 40.64249414272372], 'FOO'),
       (2, [-111.86145708699996, 40.4945008710162], 'BAR')], 
      dtype=[('i', '<i8'), ('loc', '<f8', (2,)), ('USNG', 'S100')])