I am using Commons CSV to parse CSV content relating to TV shows. One of the shows has a show name which includes double quotes;
116,6,2,29 Sep 10,""JJ" (60 min)","http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj"
The showname is "JJ" (60 min) which is already in double quotes. This is throwing an IOException java.io.IOException: (line 1) invalid char between encapsulated token and delimiter.
ArrayList<String> allElements = new ArrayList<String>();
CSVFormat csvFormat = CSVFormat.DEFAULT;
CSVParser csvFileParser = new CSVParser(new StringReader(line), csvFormat);
List<CSVRecord> csvRecords = null;
csvRecords = csvFileParser.getRecords();
for (CSVRecord record : csvRecords) {
int length = record.size();
for (int x = 0; x < length; x++) {
allElements.add(record.get(x));
}
}
csvFileParser.close();
return allElements;
CSVFormat.DEFAULT already sets withQuote('"')
I think that this CSV is not properly formatted as ""JJ" (60 min)" should be """JJ"" (60 min)" - but is there a way to get commons CSV to handle this or do I need to fix this entry manually?
Additional information: Other show names contain spaces and commas within the CSV entry and are placed within double quotes.
The problem here is that the quotes are not properly escaped. Your parser doesn't handle that. Try univocity-parsers as this is the only parser for java I know that can handle unescaped quotes inside a quoted value. It is also 4 times faster than Commons CSV. Try this code:
//configure the parser to handle your situation
CsvParserSettings settings = new CsvParserSettings();
settings.setUnescapedQuoteHandling(STOP_AT_CLOSING_QUOTE);
//create the parser
CsvParser parser = new CsvParser(settings);
//parse your line
String[] out = parser.parseLine("116,6,2,29 Sep 10,\"\"JJ\" (60 min)\",\"http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj\"");
for(String e : out){
System.out.println(e);
}
This will print:
116
6
2
29 Sep 10
"JJ" (60 min)
http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj
Hope it helps.
Disclosure: I'm the author of this library, it's open source and free (Apache 2.0 license)