I have a hive table based on avro schema. The table was created with the following query
CREATE EXTERNAL TABLE datatbl
PARTITIONED BY (date String, int time)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
'avro.schema.url'='path to schema file on HDFS')
STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '<path on hdfs>'
So far we have been inserting data into the table by setting the following properties
hive> set hive.exec.compress.output=true;
hive> set avro.output.codec=snappy;
However, if someone forgets to set the above two properties the compression is not achieved. I was wondering if there is a way to enforce compression on table itself so that even if the above two properties are not set the data is always compressed?
Yes, you can set the properties in the table. Try the following:
CREATE EXTERNAL TABLE datatbl PARTITIONED BY (date String, int time)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ( 'avro.schema.url'='path to schema file on
HDFS') STORED as INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION
'<path on hdfs>'
TBLPROPERTIES ( "orc.compress"="SNAPPY" );