Anti-Join Pandas

Question 1

Anti-Join Pandas

python pandas dataframe merge anti-join

Ayelavan · Jul 22, 2016 · Viewed 46.8k times · Source

Answer

Answer

indicator = True in merge command will tell you which join was applied by creating new column _merge with three possible values:

left_only
right_only
both

Keep right_only and left_only. That is it.

outer_join = TableA.merge(TableB, how = 'outer', indicator = True)

anti_join = outer_join[~(outer_join._merge == 'both')].drop('_merge', axis = 1)

easy!

Here is a comparison with a solution from piRSquared:

1) When run on this example matching based on one column, piRSquared's solution is faster.

2) But it only works for matching on one column. If you want to match on several columns - my solution works just as fine as with one column.

So it's up for you to decide.

Question 2

I have two tables and I would like to append them so that only all the data in table A is retained and data from table B is only added if its key is unique (Key values are unique in table A and B however in some cases a Key will occur in both table A and B).

I think the way to do this will involve some sort of filtering join (anti-join) to get values in table B that do not occur in table A then append the two tables.

I am familiar with R and this is the code I would use to do this in R.

library("dplyr")

## Filtering join to remove values already in "TableA" from "TableB"
FilteredTableB <- anti_join(TableB,TableA, by = "Key")

## Append "FilteredTableB" to "TableA"
CombinedTable <- bind_rows(TableA,FilteredTableB)

How would I achieve this in python?

Anti-Join Pandas

Answer

Related questions