Apologies if this has been asked before, but I looked extensively without results.
import pandas as pd
import numpy as np
df = pd.DataFrame(data = np.random.randint(1,10,10),columns=['a'])
a
0 7
1 8
2 8
3 3
4 1
5 1
6 2
7 8
8 6
9 6
I'd like to create a new column b
that maps several values of a
according to some rule, say a=[1,2,3] is 1, a = [4,5,6,7] is 2, a = [8,9,10] is 3. one-to-one mapping is clear to me, but what if I want to map by a list of values or a range?
I tought along these lines...
df['b'] = df['a'].map({[1,2,3]:1,range(4,7):2,[8,9,10]:3})
There are a few alternatives.
pd.cut
/ NumPy via np.digitize
You can construct a list of boundaries, then use specialist library functions. This is described in @EdChum's solution, and also in this answer.
np.select
df = pd.DataFrame(data=np.random.randint(1,10,10), columns=['a'])
criteria = [df['a'].between(1, 3), df['a'].between(4, 7), df['a'].between(8, 10)]
values = [1, 2, 3]
df['b'] = np.select(criteria, values, 0)
The elements of criteria
are Boolean series, so for lists of values, you can use df['a'].isin([1, 3])
, etc.
range
d = {range(1, 4): 1, range(4, 8): 2, range(8, 11): 3}
df['c'] = df['a'].apply(lambda x: next((v for k, v in d.items() if x in k), 0))
print(df)
a b c
0 1 1 1
1 7 2 2
2 5 2 2
3 1 1 1
4 3 1 1
5 5 2 2
6 4 2 2
7 4 2 2
8 9 3 3
9 3 1 1