Trying to learn some stuff, I'm messing around with the global shark attack database on Kaggle and I'm trying to find the best way to lump strings using a lambda
function and str.contains
.
Basically anywhere a string contains a phrase with skin diving
e.g. 'skin diving for abalone'
, in the data['Activity']
column I want to replace the activity with skin diving
. (there are 92 variations for skin diving hence trying to use the lambda function)
I can return a boolean series using
data['Activity].str.contains('skin diving')
But I'm unsure how to change the value if this condition is true
My lambda function = data.apply(lambda x: 'free diving' if x.str.contains('free diving))
but i'm getting a syntax error and i'm not familiar enough with lambda functions and pandas to get it right, any help would be appreciated.
Instead of using a Series.str method, you can use the in operator in your lambda to test for the substring
data['activity'] = data['activity'].apply(lambda x: 'skin diving' if 'skin diving' in x else x)