Using lambda conditional and pandas str.contains to lump strings

hselbie picture hselbie · Feb 9, 2017 · Viewed 20.7k times · Source

Trying to learn some stuff, I'm messing around with the global shark attack database on Kaggle and I'm trying to find the best way to lump strings using a lambda function and str.contains.

Basically anywhere a string contains a phrase with skin diving e.g. 'skin diving for abalone' , in the data['Activity'] column I want to replace the activity with skin diving. (there are 92 variations for skin diving hence trying to use the lambda function)

I can return a boolean series using

data['Activity].str.contains('skin diving')

But I'm unsure how to change the value if this condition is true

My lambda function = data.apply(lambda x: 'free diving' if x.str.contains('free diving)) but i'm getting a syntax error and i'm not familiar enough with lambda functions and pandas to get it right, any help would be appreciated.

Answer

cmaher picture cmaher · Feb 9, 2017

Instead of using a Series.str method, you can use the in operator in your lambda to test for the substring

data['activity'] = data['activity'].apply(lambda x: 'skin diving' if 'skin diving' in x else x)