How to load data to hive from HDFS without removing the source file?

Suge picture Suge · Sep 27, 2011 · Viewed 105.5k times · Source

When load data from HDFS to Hive, using

LOAD DATA INPATH 'hdfs_file' INTO TABLE tablename;

command, it looks like it is moving the hdfs_file to hive/warehouse dir. Is it possible (How?) to copy it instead of moving it, in order, for the file, to be used by another process.

Answer

Dag picture Dag · Sep 30, 2011

from your question I assume that you already have your data in hdfs. So you don't need to LOAD DATA, which moves the files to the default hive location /user/hive/warehouse. You can simply define the table using the externalkeyword, which leaves the files in place, but creates the table definition in the hive metastore. See here: Create Table DDL eg.:

create external table table_name (
  id int,
  myfields string
)
location '/my/location/in/hdfs';

Please note that the format you use might differ from the default (as mentioned by JigneshRawal in the comments). You can use your own delimiter, for example when using Sqoop:

row format delimited fields terminated by ','