Finding entries containing a substring in a numpy array?

SiOx picture SiOx · Aug 16, 2016 · Viewed 27.5k times · Source

I tried to find entries in an Array containing a substring with np.where and an in condition:

import numpy as np
foo = "aa"
bar = np.array(["aaa", "aab", "aca"])
np.where(foo in bar)

this only returns an empty Array.
Why is that so?
And is there a good alternative solution?

Answer

Divakar picture Divakar · Aug 16, 2016

We can use np.core.defchararray.find to find the position of foo string in each element of bar, which would return -1 if not found. Thus, it could be used to detect whether foo is present in each element or not by checking for -1 on the output from find. Finally, we would use np.flatnonzero to get the indices of matches. So, we would have an implementation, like so -

np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)

Sample run -

In [91]: bar
Out[91]: 
array(['aaa', 'aab', 'aca'], 
      dtype='|S3')

In [92]: foo
Out[92]: 'aa'

In [93]: np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
Out[93]: array([0, 1])

In [94]: bar[2] = 'jaa'

In [95]: np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
Out[95]: array([0, 1, 2])