Hive,change table fileformat from orc to parquet is not supported?

wuchang picture wuchang · Feb 8, 2017 · Viewed 7.7k times · Source

I have a hive table like this:

CREATE TABLE `abtestmsg_orc`(
  `eventname` string COMMENT 'AB测试方案上报事件:ABTest', 
  `eventtime` string COMMENT '事件上报时间', 
  `sessionid` string COMMENT 'Session标识', 
  `appkey` string COMMENT 'app标识', 
  `deviceid` string COMMENT 'device标识', 
  `content` string COMMENT 'AB测试方案的内容,格式是一个 json 字符串', 
  `item` array<struct<key:string,value:string>> COMMENT '扩展字段')
PARTITIONED BY ( 
  `dt` string COMMENT '??')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://hdfsmaster/hive/connector_transfer/abtestmsg_orc'

Now , I want to change the file format from ORC to parquet and change the location to other hdfs directory which contains the parquet files.So I at first I try to change the format:

alter table abtestmsg_orc1 set fileformat parquet;

but it's a pitty that it throws an exception:

FAILED: Execution Error, return code 1 from

org.apache.hadoop.hive.ql.exec.DDLTask. Changing file format (from ORC) is not supported for table connector_transfer.abtestmsg_orc1
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Changing file format (from ORC) is not supported for table connector_transfer.abtestmsg_orc1 (state=08S01,code=1)

I guess this exception means that when I change the fileformat , not only does hive change the table metadata, but also try to change all the data format from orc to parquet.But from the official doc , it says:

ALTER TABLE table_name [PARTITION partition_spec] SET FILEFORMAT file_format;

**This statement changes the table's (or partition's) file format. For available file_format options, see the section above on CREATE TABLE. The operation only changes the table metadata. Any conversion of existing data must be done outside of Hive.**

But what I want to achive is to make its location to a parquet directory.

So , what can I do to achive this ?

Answer

belostoky picture belostoky · Feb 8, 2017

You need to create another table abtestmsg_parquet which is STORED AS PARQUET and with the location for your desired parquet files:

CREATE TABLE abtestmsg_parquet(
  eventname string COMMENT 'AB测试方案上报事件:ABTest', 
  eventtime string COMMENT '事件上报时间', 
  sessionid string COMMENT 'Session标识', 
  appkey string COMMENT 'app标识', 
  deviceid string COMMENT 'device标识', 
  content string COMMENT 'AB测试方案的内容,格式是一个 json 字符串', 
  item array<struct<key:string,value:string>> COMMENT '扩展字段')
PARTITIONED BY ( 
  dt string COMMENT '??')
STORED AS PARQUET
LOCATION
  'hdfs://hdfsmaster/hive/connector_transfer/abtestmsg_parquet/'

Then u can create the files with dynamic partitioning. run:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

INSERT INTO TABLE abtestmsg_parquet PARTITION(dt)

SELECT eventname, eventtime, sessionid, appkey, deviceid, content, item, dt
FROM abtestmsg_orc;