I run following code in hive v0.12.0 and I expect to get three tables compressed using different methods and therefore size and content of the files should be different.
--- Create table and compress it with ZLIB
create table zzz_test_szlib
stored as orc
tblproperties ("orc.compress"="ZLIB")
as
select * from uk_pers_dev.orc_dib_trans limit 100000000;
--- Create table and compress it with SNAPPY
create table zzz_test_ssnap
stored as orc
tblproperties ("orc.compress"="SNAPPY")
as
select * from uk_pers_dev.orc_dib_trans limit 100000000;
--- Create table and DO NOT compress it
create table zzz_test_snone
stored as orc
tblproperties ("orc.compress"="NONE")
as
select * from uk_pers_dev.orc_dib_trans limit 100000000;
When I check the tables metadata using describe or through Hue I get:
Name Value Value Value
---------------- ------------------------------------------------ ------------------------------------------------ ------------------------------------------------
tableName test_orc_zlib test_orc_snappy test_orc_none
location:hdfs /user/hive/warehouse/test_orc_zlib /user/hive/warehouse/test_orc_snappy /user/hive/warehouse/test_orc_none
inputFormat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
outputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
compressed FALSE FALSE FALSE
serializationLib org.apache.hadoop.hive.ql.io.orc.OrcSerde org.apache.hadoop.hive.ql.io.orc.OrcSerde org.apache.hadoop.hive.ql.io.orc.OrcSerde
orc.compress ZLIB SNAPPY NONE
numFiles 1 1 1
totalSize 289970088 289970088 289970088
tableType MANAGED_TABLE MANAGED_TABLE MANAGED_TABLE
In the metadata it shows compressed=FALSE, but I don’t know how to change this and how this will affect.
But if I compare table’s data they all binary identical.
[~]$ hadoop fs -ls /user/hive/warehouse/test_orc_*
-rw-r--r-- 3 andrey supergroup 289970088 2014-05-07 13:19 /user/hive/warehouse/test_orc_none/000000_0
-rw-r--r-- 3 andrey supergroup 289970088 2014-05-07 12:34 /user/hive/warehouse/test_orc_snappy/000000_0
-rw-r--r-- 3 andrey supergroup 289970088 2014-05-07 11:48 /user/hive/warehouse/test_orc_zlib/000000_0
I tried to change/remove these options, but it makes no difference:
SET hive.exec.compress.intermediate=true;
SET hive.exec.compress.output=true;
SET mapred.output.compression.type=BLOCK;
Also I tried to use different source table (stored as TEXTFILE), no difference.
Any thoughts or suggestions?