Loading Data from a .txt file to Table Stored as ORC in Hive

Neels picture Neels · Feb 12, 2014 · Viewed 72.7k times · Source

I have a data file which is in .txt format. I am using the file to load data into Hive tables. When I load the file in a table like

CREATE TABLE test_details_txt(
visit_id INT,
store_id SMALLINT) STORED AS TEXTFILE;

the data is loaded correctly using

LOAD DATA LOCAL INPATH '/home/user/test_details.txt' INTO TABLE test_details_txt;

and I can run a SELECT * FROM test_details_txt; on the table in Hive.

However If I try to load the data in a table that is

CREATE TABLE test_details_txt(
visit_id INT,
store_id SMALLINT) STORED AS ORC; 

I receive the following error on trying to run a SELECT:

Failed with exception java.io.IOException:java.io.IOException: Malformed ORC file hdfs://master:6000/user/hive/warehouse/test.db/transaction_details/test_details.txt. Invalid postscript.

While loading the data using above LOAD statement I do not receive any error or exception.

Is there anything else that needs to be done while using the LOAD DATA IN PATH.. command to store data into an ORC table?

Answer

Sunny Nanda picture Sunny Nanda · Feb 12, 2014

LOAD DATA just copies the files to hive datafiles. Hive does not do any transformation while loading data into tables.

So, in this case the input file /home/user/test_details.txt needs to be in ORC format if you are loading it into an ORC table.

A possible workaround is to create a temporary table with STORED AS TEXT, then LOAD DATA into it, and then copy data from this table to the ORC table.

Here is an example:

CREATE TABLE test_details_txt( visit_id INT, store_id SMALLINT) STORED AS TEXTFILE;
CREATE TABLE test_details_orc( visit_id INT, store_id SMALLINT) STORED AS ORC;

-- Load into Text table
LOAD DATA LOCAL INPATH '/home/user/test_details.txt' INTO TABLE test_details_txt;

-- Copy to ORC table
INSERT INTO TABLE test_details_orc SELECT * FROM test_details_txt;