I'm trying to clean up some code in Python to vectorize a set of features and I'm wondering if there's a good way to use apply to pass multiple arguments. Consider the following (current version):
def function_1(x):
if "string" in x:
return 1
else:
return 0
df['newFeature'] = df['oldFeature'].apply(function_1)
With the above I'm having to write a new function (function_1, function_2, etc) to test for each substring "string"
that I want to find. In an ideal world I could combine all of these redundant functions and use something like this:
def function(x, string):
if string in x:
return 1
else:
return 0
df['newFeature'] = df['existingFeature'].apply(function("string"))
But trying that returns the error TypeError: function() takes exactly 2 arguments (1 given)
Is there another way to accomplish the same thing?
def function(string, x):
if string in x:
return 1
else:
return 0
df['newFeature'] = df['oldFeature'].apply(partial(function, 'string'))
I believe you want functools.partial
. A demo:
>>> from functools import partial
>>> def mult(a, b):
... return a * b
...
>>> doubler = partial(mult, 2)
>>> doubler(4)
8
In your case you need to swap arguments in function
(because of idea of partial
), and then just
df['existingFeature'].apply(partial(function, "string"))