Can one perform a left join in pandas that selects only the first match on the right? Example:
left = pd.DataFrame()
left['age'] = [11, 12]
right = pd.DataFrame()
right['age'] = [10, 11, 11]
right['salary'] = [ 100, 150, 200 ]
left.merge( right, how='left', on='age' )
Returns
age salary
0 11 150
1 11 200
2 12 NaN
But what I would like is to preserve the number of rows of left, by merely taking the first match. That is:
age salary
0 11 150
2 12 NaN
So I've been using
left.merge( right.drop_duplicates(['age']), how='left', on='age')
but I believe this makes a full copy of right. And it smells funny.
Is there a more elegant way?
Yes, you can use groupby to remove your duplicate lines. Do everything you've done to define left and right. Now, I define a new dataframe on your last line:
left2=left.merge( right, how='left', on='age' )
df= left2.groupby(['age'])['salary'].first().reset_index()
df
At first I used a .min(), which will give you the minimum salary at each age, as such:
df= left2.groupby(['age'])['salary'].min().reset_index()
But you were specifically asking about the first match. To do so you use the .first() option. Note: The .reset_index() at the end, just reformats the output of the groupby to be a dataframe again.