I have the following ranges and a pandas DataFrame:
x >= 0 # success
-10 <= x < 0 # warning
X < -10 # danger
df = pd.DataFrame({'x': [2, 1], 'y': [-7, -5], 'z': [-30, -20]})
I'd like to categorize the values in the DataFrame based on where they fall within the defined ranges. So I'd like the final DF to look something like this:
x y z x_cat y_cat z_cat
0 2 -7 -30 success warning danger
1 1 -5 -20 success warning danger
I've tried using the category
datatype but it doesn't appear I can define a range anywhere.
for category_column, value_column in zip(['x_cat', 'y_cat', 'z_cat'], ['x', 'y', 'z']):
df[category_column] = df[value_column].astype('category')
Can I use the category
datatype? If not, what can I do here?
pandas.cut
c = pd.cut(
df.stack(),
[-np.inf, -10, 0, np.inf],
labels=['danger', 'warning', 'success']
)
df.join(c.unstack().add_suffix('_cat'))
x y z x_cat y_cat z_cat
0 2 -7 -30 success warning danger
1 1 -5 -20 success warning danger
numpy
v = df.values
cats = np.array(['danger', 'warning', 'success'])
code = np.searchsorted([-10, 0], v.ravel()).reshape(v.shape)
cdf = pd.DataFrame(cats[code], df.index, df.columns)
df.join(cdf.add_suffix('_cat'))
x y z x_cat y_cat z_cat
0 2 -7 -30 success warning danger
1 1 -5 -20 success warning danger