Left Anti join in Spark dataframes

Alejandro Martinez Otal picture Alejandro Martinez Otal · Jul 25, 2018 · Viewed 9.6k times · Source

I have two dataframes, and I would like to retrieve only the information of one of the dataframes, which is not found in the inner join, see the picture:

Full outer join

I have tried several ways: Inner join and filtering the rows that return at least one null, all the types of joins described on the docs for Spark 1.6 but failed to do obtain the results from just one join.

Can anybody help?

Answer

Manoj Kumar Dhakad picture Manoj Kumar Dhakad · Jul 25, 2018

This is called right excluding join and you can do like below

df1.join(df2,df1("column1")===df2("column2"),"right_outer").filter("column1 is null").show