I have a dataframe along the lines of the below:
Type Set
1 A Z
2 B Z
3 B X
4 C Y
I want to add another column to the dataframe (or generate a series) of the same length as the dataframe (equal number of records/rows) which sets a colour 'green'
if Set == 'Z'
and 'red'
if Set
equals anything else.
What's the best way to do this?
If you only have two choices to select from:
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
For example,
import pandas as pd
import numpy as np
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
df['color'] = np.where(df['Set']=='Z', 'green', 'red')
print(df)
yields
Set Type color
0 Z A green
1 Z B green
2 X B red
3 Y C red
If you have more than two conditions then use np.select
. For example, if you want color
to be
yellow
when (df['Set'] == 'Z') & (df['Type'] == 'A')
blue
when (df['Set'] == 'Z') & (df['Type'] == 'B')
purple
when (df['Type'] == 'B')
black
,then use
df = pd.DataFrame({'Type':list('ABBC'), 'Set':list('ZZXY')})
conditions = [
(df['Set'] == 'Z') & (df['Type'] == 'A'),
(df['Set'] == 'Z') & (df['Type'] == 'B'),
(df['Type'] == 'B')]
choices = ['yellow', 'blue', 'purple']
df['color'] = np.select(conditions, choices, default='black')
print(df)
which yields
Set Type color
0 Z A yellow
1 Z B blue
2 X B purple
3 Y C black