How to invert a permutation array in numpy

Lauritz V. Thaulow picture Lauritz V. Thaulow · Jul 25, 2012 · Viewed 10.3k times · Source

Given a self-indexing (not sure if this is the correct term) numpy array, for example:

a = np.array([3, 2, 0, 1])

This represents this permutation (=> is an arrow):

0 => 3
1 => 2
2 => 0
3 => 1

I'm trying to make an array representing the inverse transformation without doing it "manually" in python, that is, I want a pure numpy solution. The result I want in the above case is:

array([2, 3, 1, 0])

Which is equivalent to

0 <= 3                0 => 2
1 <= 2       or       1 => 3
2 <= 0                2 => 1
3 <= 1                3 => 0

It seems so simple, but I just can't think of how to do it. I've tried googling, but haven't found anything relevant.

Answer

Ali picture Ali · Aug 27, 2014

Sorting is an overkill here. This is just a single-pass, linear time algorithm with constant memory requirement:

from __future__ import print_function
import numpy as np

p = np.array([3, 2, 0, 1])
s = np.empty(p.size, dtype=np.int32)
for i in np.arange(p.size):
    s[p[i]] = i

print('s =', s)

The above code prints

 s = [2 3 1 0]

as required.

The rest of the answer is concerned with the efficient vectorization of the above for loop. If you just want to know the solution, jump to the end of this answer.



(The original answer from Aug 27, 2014; the timings are valid for NumPy 1.8. An update with NumPy 1.11 follows later.)

A single-pass, linear time algorithm is expected to be faster than np.argsort; interestingly, the trivial vectorization (s[p] = xrange(p.size), see index arrays) of the above for loop is actually slightly slower than np.argsort as long as p.size < 700 000 (well, on my machine, your mileage will vary):

import numpy as np

def np_argsort(p):
    return np.argsort(p)

def np_fancy(p):
    s = np.zeros(p.size, p.dtype) # np.zeros is better than np.empty here, at least on Linux
    s[p] = xrange(p.size) 
    return s

def create_input(n):
    np.random.seed(31)
    indices = np.arange(n, dtype = np.int32)
    return np.random.permutation(indices)

From my IPython notebook:

p = create_input(700000)
%timeit np_argsort(p)
10 loops, best of 3: 72.7 ms per loop
%timeit np_fancy(p)
10 loops, best of 3: 70.2 ms per loop

Eventually the asymptotic complexity kicks in (O(n log n) for argsort vs. O(n) for the single-pass algorithm) and the single-pass algorithm will be consistently faster after a sufficiently large n = p.size (threshold is around 700k on my machine).

However, there is a less straightforward way to vectorize the above for loop with np.put:

def np_put(p):
    n = p.size
    s = np.zeros(n, dtype = np.int32)
    i = np.arange(n, dtype = np.int32)
    np.put(s, p, i) # s[p[i]] = i 
    return s

Which gives for n = 700 000 (the same size as above):

p = create_input(700000)
%timeit np_put(p)
100 loops, best of 3: 12.8 ms per loop

This is a nice 5.6x speed up for next to nothing!

To be fair, np.argsort still beats the np.put approach for smaller n (the tipping point is around n = 1210 on my machine):

p = create_input(1210)
%timeit np_argsort(p)
10000 loops, best of 3: 25.1 µs per loop
%timeit np_fancy(p)
10000 loops, best of 3: 118 µs per loop
%timeit np_put(p)
10000 loops, best of 3: 25 µs per loop

This is most likely because we allocate and fill in an extra array (at the np.arange() call) with the np_put approach.


Although you didn't ask for a Cython solution, just out of curiosity, I also timed the following Cython solution with typed memoryviews:

import numpy as np
cimport numpy as np

def in_cython(np.ndarray[np.int32_t] p):    
    cdef int i
    cdef int[:] pmv
    cdef int[:] smv 
    pmv = p
    s = np.empty(p.size, dtype=np.int32)
    smv = s
    for i in xrange(p.size):
        smv[pmv[i]] = i
    return s

Timings:

p = create_input(700000)
%timeit in_cython(p)
100 loops, best of 3: 2.59 ms per loop

So, the np.put solution is still not as fast as possible (ran 12.8 ms for this input size; argsort took 72.7 ms).


Update on Feb 3, 2017 with NumPy 1.11

Jamie, Andris and Paul pointed out in comments below that the performance issue with fancy indexing was resolved. Jamie says it was already resolved in NumPy 1.9. I tested it with Python 3.5 and NumPy 1.11 on the machine that I was using back in 2014.

def invert_permutation(p):
    s = np.empty(p.size, p.dtype)
    s[p] = np.arange(p.size)
    return s

Timings:

p = create_input(880)
%timeit np_argsort(p)
100000 loops, best of 3: 11.6 µs per loop
%timeit invert_permutation(p)
100000 loops, best of 3: 11.5 µs per loop

A significant improvement indeed!



Conclusion

All in all, I would go with the

def invert_permutation(p):
    '''The argument p is assumed to be some permutation of 0, 1, ..., len(p)-1. 
    Returns an array s, where s[i] gives the index of i in p.
    '''
    s = np.empty_like(p)
    s[p] = np.arange(p.size)
    return s

approach for code clarity. In my opinion, it is less obscure than argsort, and also faster for large input sizes. If speed becomes an issue, I would go with the Cython solution.