np.delete and np.s_. What's so special about np_s?

Question 1

np.delete and np.s_. What's so special about np_s?

python-2.7 numpy indexing slice delete-row

Greg Castaldi · Sep 20, 2015 · Viewed 9.3k times · Source

Answer

Answer

np.delete is not doing anything unique or special. It just returns a copy of the original array with some items missing. Most of the code just interprets the inputs in preparation to make this copy.

What you are asking about is the obj parameter

obj : slice, int or array of ints

In simple terms, np.s_ lets you supply a slice using the familiar : syntax. The x:y notation cannot be used as a function parameter.

Let's try your alternatives (you allude to these in results and errors, but they are buried in the text):

In [213]: x=np.arange(10)*2   # some distinctive values

In [214]: np.delete(x, np.s_[3:6])
Out[214]: array([ 0,  2,  4, 12, 14, 16, 18])

So delete with s_ removes a range of values, namely 6 8 10, the 3rd through 5th ones.

In [215]: np.delete(x, [3:6])
  File "<ipython-input-215-0a5bf5cc05ba>", line 1
    np.delete(x, [3:6])
                   ^
SyntaxError: invalid syntax

Why the error? Because [3:4] is an indexing expression. np.delete is a function. Even s_[[3:4]] has problems. np.delete(x, 3:6) is also bad, because Python only accepts the : syntax in an indexing context, where it automatically translates it into a slice object. Note that is is a syntax error, something that the interpreter catches before doing any calculations or function calls.

In [216]: np.delete(x, slice(3,6))
Out[216]: array([ 0,  2,  4, 12, 14, 16, 18])

A slice works instead of s_; in fact that is what s_ produces

In [233]: np.delete(x, [3,4,5])
Out[233]: array([ 0,  2,  4, 12, 14, 16, 18])

A list also works, though it works in different way (see below).

In [217]: np.delete(x, x[3:6])
Out[217]: array([ 0,  2,  4,  6,  8, 10, 14, 18])

This works, but produces are different result, because x[3:6] is not the same as range(3,6). Also the np.delete does not work like the list delete. It deletes by index, not by matching value.

np.index_exp fails for the same reason that np.delete(x, (slice(3,6),)) does. 1, [1], (1,) are all valid and remove one item. Even '1', the string, works. delete parses this argument, and at this level, expects something that can be turned into an integer. obj.astype(intp). (slice(None),) is not a slice, it is a 1 item tuple. So it's handled in a different spot in the delete code. This is TypeError produced by something that delete calls, very different from the SyntaxError. In theory delete could extract the slice from the tuple and proceed as in the s_ case, but the developers did not choose to consider this variation.

A quick study of the code shows that np.delete uses 2 distinct copying methods - by slice and by boolean mask. If the obj is a slice, as in our example, it does (for 1d array):

out = np.empty(7)
out[0:3] = x[0:3]
out[3:7] = x[6:10]

But with [3,4,5] (instead of the slice) it does:

keep = np.ones((10,), dtype=bool)
keep[[3,4,5]] = False
return x[keep]

Same result, but with a different construction method. x[np.array([1,1,1,0,0,0,1,1,1,1],bool)] does the same thing.

In fact boolean indexing or masking like this is more common than np.delete, and generally just as powerful.

From the lib/index_tricks.py source file:

index_exp = IndexExpression(maketuple=True)
s_ = IndexExpression(maketuple=False)

They are slighly different versions of the same thing. And both are just convenience functions.

In [196]: np.s_[1:4]
Out[196]: slice(1, 4, None)
In [197]: np.index_exp[1:4]
Out[197]: (slice(1, 4, None),)
In [198]: np.s_[1:4, 5:10]
Out[198]: (slice(1, 4, None), slice(5, 10, None))
In [199]: np.index_exp[1:4, 5:10]
Out[199]: (slice(1, 4, None), slice(5, 10, None))

The maketuple business applies only when there is a single item, a slice or index.

Question 2

I don't really understand why regular indexing can't be used for np.delete. What makes np.s_ so special?

For example with this code, used to delete the some of the rows of this array..

inlet_names = np.delete(inlet_names, np.s_[1:9], axis = 0)

Why can't I simply use regular indexing and do..

inlet_names = np.delete(inlet_names, [1:9], axis = 0)

or

inlet_names = np.delete(inlet_names, inlet_names[1:9], axis = 0)

From what I can gather, np.s_ is the same as np.index_exp except it doesn't return a tuple, but both can be used anywhere in Python code.

Then when I look into the np.delete function, it indicates that you can use something like [1,2,3] to delete those specific indexes along the entire array. So whats preventing me from using something similar to delete certain rows or columns from the array?

I'm simply assuming that this type of indexing is read as something else in np.delete so you need to use np.s_ in order to specify, but I can't get to the bottom of what exactly it would be reading it as because when I try the second piece of code it simply returns "invalid syntax". Which is weird because this code works...

inlet_names = np.delete(inlet_names, [1,2,3,4,5,6,7,8,9], axis = 0)

So I guess the answer could possibly be that np.delete only accepts a list of the indexes that you would like to delete. And that np._s returns a list of the indexes that you specify for the slice.

Just could use some clarification and some corrections on anything I just said about the functions that may be wrong, because a lot of this is just my take, the documents don't exactly explain everything that I was trying to understand. I think I'm just overthinking this, but I would like to actually understand it, if someone could explain it.

np.delete and np.s_. What's so special about np_s?

Answer

Related questions