I have two tables and I would like to append them so that only all the data in table A is retained and data from table B is only added if its key is unique (Key values are unique in table A and B however in some cases a Key will occur in both table A and B).
I think the way to do this will involve some sort of filtering join (anti-join) to get values in table B that do not occur in table A then append the two tables.
I am familiar with R and this is the code I would use to do this in R.
library("dplyr")
## Filtering join to remove values already in "TableA" from "TableB"
FilteredTableB <- anti_join(TableB,TableA, by = "Key")
## Append "FilteredTableB" to "TableA"
CombinedTable <- bind_rows(TableA,FilteredTableB)
How would I achieve this in python?
indicator = True
in merge
command will tell you which join was applied by creating new column _merge
with three possible values:
left_only
right_only
both
Keep right_only
and left_only
. That is it.
outer_join = TableA.merge(TableB, how = 'outer', indicator = True)
anti_join = outer_join[~(outer_join._merge == 'both')].drop('_merge', axis = 1)
easy!
Here is a comparison with a solution from piRSquared:
1) When run on this example matching based on one column, piRSquared's solution is faster.
2) But it only works for matching on one column. If you want to match on several columns - my solution works just as fine as with one column.
So it's up for you to decide.