Why does Scala crash when reading my CSV?

deltanovember picture deltanovember · Aug 20, 2011 · Viewed 7.6k times · Source

The file is here

http://dl.dropbox.com/u/12337149/history.csv

I try to read the data as follows

  for (line <- Source.fromFile(new File(file)).getLines) {
   println(line)
  }

I get the following error

Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1
    at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:319)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
    at java.io.InputStreamReader.read(InputStreamReader.java:167)
    at java.io.BufferedReader.fill(BufferedReader.java:136)
    at java.io.BufferedReader.readLine(BufferedReader.java:299)
    at java.io.BufferedReader.readLine(BufferedReader.java:362)
    at scala.io.BufferedSource$BufferedLineIterator.<init>(BufferedSource.scala:32)
    at scala.io.BufferedSource.getLines(BufferedSource.scala:43)
    at com.alluvia.reports.RunIGConverter$$anonfun$main$1.apply(RunIGConverter.scala:17)
    at com.alluvia.reports.RunIGConverter$$anonfun$main$1.apply(RunIGConverter.scala:15)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
    at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38)
    at com.alluvia.reports.RunIGConverter$.main(RunIGConverter.scala:15)
    at com.alluvia.reports.RunIGConverter.main(RunIGConverter.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

The file opens just fine in excel. I think it is some type of encoding issue but I do not know the work around

Answer

Rex Kerr picture Rex Kerr · Aug 20, 2011

I'd try the ISO8859_1 encoding, or Cp1252 if that doesn't work, as so:

Source.fromFile(new File(file), "ISO-8859-1").getLines()

You can see which encodings Sun Java supports here. I forget whether you're supposed to use the nio or io versions. (As you can see from my answer, which has used both.)