MySQL LOAD DATA INFILE: works, but unpredictable line terminator

SDC picture SDC · Jun 7, 2012 · Viewed 23.2k times · Source

MySQL has a nice CSV import function LOAD DATA INFILE.

I have a large dataset that needs to be imported from CSV on a regular basis, so this feature is exactly what I need. I've got a working script that imports my data perfectly.

.....except.... I don't know in advance what the end-of-line terminator will be.

My SQL code currently looks something like this:

LOAD DATA INFILE '{fileName}'
 INTO TABLE {importTable}
 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
 LINES TERMINATED BY '\n'
 IGNORE 1 LINES
( {fieldList} );

This works great for some import files.

However, the import data is coming from multiple sources. Some of them have the \n terminator; others have \r\n. I can't predict which one I'll have.

Is there a way using LOAD DATA INFILE to specify that my lines may be terminated with either \n or \r\n? How do I deal with this?

Answer

Devart picture Devart · Jun 12, 2012

You can specify line separator as '\n' and remove trailing '\r' separators if necessary from the last field during loading.

For example -

Suppose we have the 'entries.txt' file. The line separator is '\r\n', and only after line ITEM2 | CLASS3 | DATE2 the separator is '\n':

COL1  | COL2   | COL3
ITEM1 | CLASS1 | DATE1
ITEM2 | CLASS3 | DATE2
ITEM3 | CLASS1 | DATE3
ITEM4 | CLASS2 | DATE4

CREATE TABLE statement:

CREATE TABLE entries(
  column1 VARCHAR(255) DEFAULT NULL,
  column2 VARCHAR(255) DEFAULT NULL,
  column3 VARCHAR(255) DEFAULT NULL
)

Our LOAD DATA INFILE query:

LOAD DATA INFILE 'entries.txt' INTO TABLE entries
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(column1, column2, @var)
SET column3 = TRIM(TRAILING '\r' FROM @var);

Show results:

SELECT * FROM entries;
+---------+----------+---------+
| column1 | column2  | column3 |
+---------+----------+---------+
| ITEM1   |  CLASS1  |  DATE1  |
| ITEM2   |  CLASS3  |  DATE2  |
| ITEM3   |  CLASS1  |  DATE3  |
| ITEM4   |  CLASS2  |  DATE4  |
+---------+----------+---------+