I have a hive table like this:
CREATE TABLE `abtestmsg_orc`(
`eventname` string COMMENT 'AB测试方案上报事件:ABTest',
`eventtime` string COMMENT '事件上报时间',
`sessionid` string COMMENT 'Session标识',
`appkey` string COMMENT 'app标识',
`deviceid` string COMMENT 'device标识',
`content` string COMMENT 'AB测试方案的内容,格式是一个 json 字符串',
`item` array<struct<key:string,value:string>> COMMENT '扩展字段')
PARTITIONED BY (
`dt` string COMMENT '??')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://hdfsmaster/hive/connector_transfer/abtestmsg_orc'
Now , I want to change the file format from ORC to parquet and change the location to other hdfs directory which contains the parquet files.So I at first I try to change the format:
alter table abtestmsg_orc1 set fileformat parquet;
but it's a pitty that it throws an exception:
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. Changing file format (from ORC) is not supported for table connector_transfer.abtestmsg_orc1
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Changing file format (from ORC) is not supported for table connector_transfer.abtestmsg_orc1 (state=08S01,code=1)
I guess this exception means that when I change the fileformat , not only does hive change the table metadata, but also try to change all the data format from orc to parquet.But from the official doc , it says:
ALTER TABLE table_name [PARTITION partition_spec] SET FILEFORMAT file_format;
**This statement changes the table's (or partition's) file format. For available file_format options, see the section above on CREATE TABLE. The operation only changes the table metadata. Any conversion of existing data must be done outside of Hive.**
But what I want to achive is to make its location to a parquet directory.
So , what can I do to achive this ?
You need to create another table abtestmsg_parquet which is STORED AS PARQUET and with the location for your desired parquet files:
CREATE TABLE abtestmsg_parquet(
eventname string COMMENT 'AB测试方案上报事件:ABTest',
eventtime string COMMENT '事件上报时间',
sessionid string COMMENT 'Session标识',
appkey string COMMENT 'app标识',
deviceid string COMMENT 'device标识',
content string COMMENT 'AB测试方案的内容,格式是一个 json 字符串',
item array<struct<key:string,value:string>> COMMENT '扩展字段')
PARTITIONED BY (
dt string COMMENT '??')
STORED AS PARQUET
LOCATION
'hdfs://hdfsmaster/hive/connector_transfer/abtestmsg_parquet/'
Then u can create the files with dynamic partitioning. run:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT INTO TABLE abtestmsg_parquet PARTITION(dt)
SELECT eventname, eventtime, sessionid, appkey, deviceid, content, item, dt
FROM abtestmsg_orc;