How to create table in partition data in hive?

Mukesh Gupta picture Mukesh Gupta · Mar 29, 2018 · Viewed 29k times · Source
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/_impala_insert_staging
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI
[mgupta@sjc-dev-binn01 ~]$ hadoop fs -ls /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI
Found 27 items
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201602
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201603
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201604
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201605
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201606
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201607
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201608
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201609
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201610
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201611
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201612
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201701
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201702
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201703
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201704
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201705
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201706
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:17 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201707
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201708
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201709
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201710
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201711
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201712
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201801
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201802
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:18 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201803
[mgupta@sjc-dev-binn01 ~]$ hadoop fs -ls /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601
Found 3 items
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601/company_sid=0
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601/company_sid=38527
drwxr-xr-x   - mgupta supergroup          0 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601/company_sid=__HIVE_DEFAULT_PARTITION__
[mgupta@sjc-dev-binn01 ~]$ hadoop fs -ls /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601/company_sid=0
Found 1 items
-rw-r--r--   3 mgupta supergroup    2069014 2018-03-26 22:16 /kylin/retailer/qi_basket_brand_bucket_fact/product_hierarchy_type=CI/month_id=201601/company_sid=0/f9466a0068b906cf-6ace7f8500000049_294515768_data.0.parq
[mgupta@sjc-dev-binn01 ~]$

Answer

Satya Pavan picture Satya Pavan · Mar 29, 2018

You may try the steps given below.

Approach 1

  1. Identify the schema (column names and types, including the partitioned column)
  2. Create a hive partitioned table (Make sure to add partition column & delimiter information)
  3. Load data into the partitioned table.(In this case, the loading file will not have partition column as you will hard-code it via the load command)

    create table <table_name> (col1 data_type1, col2 data_type2..)
    partitioned by(part_col data_type3)
    row format delimited
    fields terminated by '<field_delimiter_in_your_data>'
    
    load data inpath '/hdfs/loc/file1' into table <table_name>
    partition (<part_col>='201601');
    
    load data inpath '/hdfs/loc/file1' into table <table_name>
    partition (<part_col>='201602')
    
    load data inpath '/hdfs/loc/file1' into table <table_name>
    partition (<part_col>='201603')
    

    and so on.


Approach 2

  1. Create a staging table (temporary table) with same schema as of main table but without any partitions
  2. Load your entire data into this table (Make sure you have the 'partition column' as one of the fields in these files)
  3. Load data to your main table from staging table using dynamic partition insert.

    create table <staging_table> (col1 data_type1, col2 data_type2..)
    row format delimited
    fields terminated by '<field_delimiter_in_your_data>'
    
    create table <main_table> (col1 data_type1, col2 data_type2..)
    partitioned by(part_col data_type3);
    
    load data inpath '/hdfs/loc/directory/' into table <staging_table>;
    
    SET hive.exec.dynamic.partition=true; 
    SET hive.exec.dynamic.partition.mode=nonstrict;
    
    insert into table <main_table> 
    partition(part_col) 
    select col1,col2,....part_col from <staging_table>;
    

Key aspects in approach 2 are:

  • Making the 'part_col' available as a field in the loading file
  • In the final insert statement, get 'part_col' as the last field from select clause.