1- I'm trying to delete multiple partitions at once, but struggling to do it with either Impala or Hive. I tried the following query, with and without '
:
ALTER TABLE cz_prd_corrti_st.s1mme_transstats_info DROP IF EXISTS
PARTITION (pr_load_time='20170701000317')
PARTITION (pr_load_time='20170701000831')
The error I'm getting is as follow:
AnalysisException: Syntax error in line 3: PARTITION (pr_load_time='20170701000831') ^ Encountered: PARTITION Expected: CACHED, LOCATION, PURGE, SET, UNCACHED CAUSED BY: Exception: Syntax error
The partition column is bigint type, query for deleting only one partition works as expected:
ALTER TABLE cz_prd_corrti_st.s1mme_transstats_info DROP IF EXISTS
PARTITION (pr_load_time='20170701000317')
2- Is it a good practice delete the hdfs data first and then drop the partitions in Impala/Hive, or is it supposed to be done vice versa?
Your syntax is wrong.
In the DROP command the partitions should be separated by commas.
Demo
hive> create table t (i int) partitioned by (p int);
OK
hive> alter table t add partition (p=1) partition(p=2) partition(p=3) partition(p=4) partition(p=5);
OK
hive> show partitions t;
OK
partition
p=1
p=2
p=3
p=4
p=5
hive> alter table t drop if exists partition (p=1),partition (p=2),partition(p=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK
hive> show partitions t;
OK
partition
p=4
p=5
You can drop a range.
Demo
hive> create table t (i int) partitioned by (p int);
OK
hive> alter table t add partition (p=1) partition(p=2) partition(p=3) partition(p=4) partition(p=5);
OK
hive> show partitions t;
OK
partition
p=1
p=2
p=3
p=4
p=5
hive> alter table t drop if exists partition (p<=3);
Dropped the partition p=1
Dropped the partition p=2
Dropped the partition p=3
OK
hive> show partitions t;
OK
partition
p=4
p=5