Hive load CSV with commas in quoted fields

Martijn Lenderink picture Martijn Lenderink · Nov 29, 2012 · Viewed 104.3k times · Source

I am trying to load a CSV file into a Hive table like so:

CREATE TABLE mytable
(
num1 INT,
text1 STRING,
num2 INT,
text2 STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

LOAD DATA LOCAL INPATH '/data.csv'
OVERWRITE INTO TABLE mytable;    


The csv is delimited by an comma (,) and looks like this:

1, "some text, with comma in it", 123, "more text"

This will return corrupt data since there is a ',' in the first string.
Is there a way to set an text delimiter or make Hive ignore the ',' in strings?

I can't change the delimiter of the csv since it gets pulled from an external source.

Answer

libjack picture libjack · Nov 30, 2012

If you can re-create or parse your input data, you can specify an escape character for the CREATE TABLE:

ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';

Will accept this line as 4 fields

1,some text\, with comma in it,123,more text