Can I cluster by/bucket a table created via "CREATE TABLE AS SELECT....." in Hive?

Question 1

Can I cluster by/bucket a table created via "CREATE TABLE AS SELECT....." in Hive?

hadoop hive hiveql bucket hadoop-partitioning

Andrew · Jul 22, 2014 · Viewed 8.3k times · Source

Answer

Answer

Came across this question and saw there was no answer provided. I looked further and found the answer in the Hive documentation.

This will never work, because of the following restrictions on CTAS:

The target table cannot be a partitioned table.
The target table cannot be an external table.
The target table cannot be a list bucketing table.

Source: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect%28CTAS

Furthermore https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
...
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
...
[AS select_statement];

Clustering requires the column to be defined and then the cfg goes to the As select_statement Therefore at this time it is not possible.

Optionally, you can ALTER the table and add buckets, but this does not change existing data.

CREATE TABLE BUCKET_TABLE 
STORED AS ORC AS 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll limit 0;
ALTER TABLE BUCKET_TABLE CLUSTERED BY (key) INTO 1000 BUCKETS;
ALTER TABLE BUCKET_TABLE SET TBLPROPERTIES ('transactional'='true');
INSERT INTO BUCKET_TABLE 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll;

Question 2

I am trying to create a table in Hive

CREATE TABLE BUCKET_TABLE AS 
SELECT a.* FROM TABLE1 a LEFT JOIN TABLE2 b ON (a.key=b.key) WHERE b.key IS NUll
CLUSTERED BY (key) INTO 1000 BUCKETS;

This syntax is failing - but I am not sure if it is even possible to do this combined statement. Any ideas?

Can I cluster by/bucket a table created via "CREATE TABLE AS SELECT....." in Hive?

Answer

Related questions