Currently I have a HIVE 0.7 instance on Amazon EMR. I am trying to create a duplicate of this instance on a new EMR cluster using Hive 0.11.
In my 0.7 instance I have an external table that will set empty strings to NULL. Here is how I create the table:
CREATE EXTERNAL TABLE IF NOT EXISTS tablename
(column1 string,
column2 string)
PARTITIONED BY (year STRING, month STRING, day STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
TBLPROPERTIES ('serialization.null.format' = '');
Data is added to the table like this:
ALTER TABLE tablename
ADD PARTITION (year = '2013', month = '10', day='01')
LOCATION '/location_in_hdfs';
This works great in 0.7 but in 0.11 it doesn't seem to be evaluating my empty strings as NULLS. Interestingly, creating a normal table with the same data and table definition seems to evaluate empty strings as NULLs as expected.
Is there are different way to do this with an external table in 0.11?
Hive default partition properties overriding the table properties. Include SERDE properties in your alter statement:
ALTER TABLE tablename ADD PARTITION (year = '2013', month = '10', day='01') SET
SERDEPROPERTIES ('serialization.null.format' = '');