I have two tables, table1 and table2. Each with the same columns:
key, c1, c2, c3
I want to check to see if these tables are equal to eachother (they have the same rows). So far I have these two queries (<> = not equal in HIVE):
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key
where t2.key is null or t1.c1<>t2.c1 or t1.c2<>t2.c2 or t1.c3<>t2.c3
And
select count(*) from table1 t1
left outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t2.key is null
So my idea is that, if a zero count is returned, the tables are the same. However, I'm getting a zero count for the first query, and a non-zero count for the second query. How exactly do they differ? If there is a better way to check this certainly let me know.
The first one excludes rows where t1.c1, t1.c2, t1.c3, t2.c1, t2.c2, or t2.c3 is null. That means that you effectively doing an inner join.
The second one will find rows that exist in t1 but not in t2.
To also find rows that exist in t2 but not in t1 you can do a full outer join. The following SQL assumes that all columns are NOT NULL
:
select count(*) from table1 t1
full outer join table2 t2
on t1.key=t2.key and t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3
where t1.key is null /* this condition matches rows that only exist in t2 */
or t2.key is null /* this condition matches rows that only exist in t1 */