Selecting specific rows and columns from NumPy array

Mike C picture Mike C · Apr 8, 2014 · Viewed 164.8k times · Source

I've been going crazy trying to figure out what stupid thing I'm doing wrong here.

I'm using NumPy, and I have specific row indices and specific column indices that I want to select from. Here's the gist of my problem:

import numpy as np

a = np.arange(20).reshape((5,4))
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [ 8,  9, 10, 11],
#        [12, 13, 14, 15],
#        [16, 17, 18, 19]])

# If I select certain rows, it works
print a[[0, 1, 3], :]
# array([[ 0,  1,  2,  3],
#        [ 4,  5,  6,  7],
#        [12, 13, 14, 15]])

# If I select certain rows and a single column, it works
print a[[0, 1, 3], 2]
# array([ 2,  6, 14])

# But if I select certain rows AND certain columns, it fails
print a[[0,1,3], [0,2]]
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# ValueError: shape mismatch: objects cannot be broadcast to a single shape

Why is this happening? Surely I should be able to select the 1st, 2nd, and 4th rows, and 1st and 3rd columns? The result I'm expecting is:

a[[0,1,3], [0,2]] => [[0,  2],
                      [4,  6],
                      [12, 14]]

Answer

Praveen picture Praveen · Apr 8, 2014

As Toan suggests, a simple hack would be to just select the rows first, and then select the columns over that.

>>> a[[0,1,3], :]            # Returns the rows you want
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [12, 13, 14, 15]])
>>> a[[0,1,3], :][:, [0,2]]  # Selects the columns you want as well
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

[Edit] The built-in method: np.ix_

I recently discovered that numpy gives you an in-built one-liner to doing exactly what @Jaime suggested, but without having to use broadcasting syntax (which suffers from lack of readability). From the docs:

Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].

So you use it like this:

>>> a = np.arange(20).reshape((5,4))
>>> a[np.ix_([0,1,3], [0,2])]
array([[ 0,  2],
       [ 4,  6],
       [12, 14]])

And the way it works is that it takes care of aligning arrays the way Jaime suggested, so that broadcasting happens properly:

>>> np.ix_([0,1,3], [0,2])
(array([[0],
        [1],
        [3]]), array([[0, 2]]))

Also, as MikeC says in a comment, np.ix_ has the advantage of returning a view, which my first (pre-edit) answer did not. This means you can now assign to the indexed array:

>>> a[np.ix_([0,1,3], [0,2])] = -1
>>> a    
array([[-1,  1, -1,  3],
       [-1,  5, -1,  7],
       [ 8,  9, 10, 11],
       [-1, 13, -1, 15],
       [16, 17, 18, 19]])