I'm comparing the behavior between SQL engines. Oracle has the behavior I would expect from a SQL engine for full outer joins:
CREATE TABLE sql_test_a
(
ID VARCHAR2(4000 BYTE),
FIRST_NAME VARCHAR2(200 BYTE),
LAST_NAME VARCHAR2(200 BYTE)
);
CREATE TABLE sql_test_b
(
NUM VARCHAR2(4000 BYTE),
FIRST_NAME VARCHAR2(200 BYTE),
LAST_NAME VARCHAR2(200 BYTE)
);
INSERT INTO sql_test_a (ID, FIRST_NAME, LAST_NAME) VALUES ('1', 'John', 'Snow');
INSERT INTO sql_test_a (ID, FIRST_NAME, LAST_NAME) VALUES ('2', 'Mike', 'Tyson');
INSERT INTO sql_test_b (NUM, FIRST_NAME, LAST_NAME) VALUES ('20', 'Mike', 'Tyson');
When I execute the following, it gives me the expected result. The resulting table contains two rows, with one of the rows containing NULL
for the NUM
field, because there is no john snow in the table sql_test_b
.
SELECT A.FIRST_NAME, A.LAST_NAME, A.ID, B.NUM
FROM
SQL_TEST_A A
FULL OUTER JOIN
SQL_TEST_B B
ON
A.FIRST_NAME = B.FIRST_NAME
AND
A.LAST_NAME = B.LAST_NAME;
You can test the sql script here: http://sqltest.net/
In HIVE, however, if you were to try the same thing, the full outer join results in a table with two rows. The row that should be the "John Snow" row contains NULL
for the fields FIRST_NAME, LAST_NAME, and NUM. The 1
is filled in for ID
, but that's it.
Why such weird behavior in HIVE? Is this a bug? Or am I missing something...because Oracle 11g seems to handle this much better. Thanks.
I could not simulate the result reported by @Candic3
I used the below statements along with the same "select" query as in the question.
CREATE TABLE IF NOT EXISTS sql_test_a (ID String, FIRST_NAME String, LAST_NAME String) COMMENT 'sql_test_a'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
CREATE TABLE IF NOT EXISTS sql_test_b (NUM String, FIRST_NAME String, LAST_NAME String) COMMENT 'sql_test_b'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
INSERT INTO sql_test_a VALUES ('1', 'John', 'Snow');
INSERT INTO sql_test_a VALUES ('2', 'Mike', 'Tyson');
INSERT INTO sql_test_b VALUES ('20', 'Mike', 'Tyson');
SELECT A.FIRST_NAME, A.LAST_NAME, A.ID, B.NUM
FROM
SQL_TEST_A A
FULL OUTER JOIN
SQL_TEST_B B
ON
A.FIRST_NAME = B.FIRST_NAME
AND
A.LAST_NAME = B.LAST_NAME;
Please find the result attached.
However, select query would return NULL due to unnoticed minor mistakes like data-type mismatch between the DDL and the actual data (say, from flat files) or mismatch among the delimiter mentioned in the DDL and the ones in the actual data.