I have two dataframes, and I would like to retrieve only the information of one of the dataframes, which is not found in the inner join, see the picture:
I have tried several ways: Inner join and filtering the rows that return at least one null, all the types of joins described on the docs for Spark 1.6 but failed to do obtain the results from just one join.
Can anybody help?
This is called right excluding join and you can do like below
df1.join(df2,df1("column1")===df2("column2"),"right_outer").filter("column1 is null").show