Printing Unicode from Scala interpreter

Martin Sturm picture Martin Sturm · Dec 22, 2009 · Viewed 12.1k times · Source

When using the scala interpreter (i.e. running the command 'scala' on the commandline), I am not able to print unicode characters correctly. Of course a-z, A-Z, etc. are printed correctly, but for example € or ƒ is printed as a ?.

print(8364.toChar)

results in ? instead of €. Probably I'm doing something wrong. My terminal supports utf-8 characters and even when I pipe the output to a seperate file and open it in a texteditor, ? is displayed.

This is all happening on Mac OS X (Snow Leopard, 10.6.2) with Scala 2.8 (nightly build) and Java 1.6.0_17)

Answer

Martin Sturm picture Martin Sturm · Dec 30, 2009

I found the cause of the problem, and a solution to make it work as it should. As I already suspected after posting my question and reading the answer of Calum and issues with encoding on the Mac with another project (which was in Java), the cause of the problem is the default encoding used by Mac OS X. When you start scala interpreter, it will use the default encoding for the specified platform. On Mac OS X, this is Macroman, on Windows it is probably CP1252. You can check this by typing the following command in the scala interpreter:

scala> System.getProperty("file.encoding");
res3: java.lang.String = MacRoman

According to the scala help test, it is possible to provide Java properties using the -D option. However, this does not work for me. I ended up setting the environment variable

JAVA_OPTS="-Dfile.encoding=UTF-8"

After running scala, the result of the previous command will give the following result:

scala> System.getProperty("file.encoding")
res0: java.lang.String = UTF-8

Now, printing special characters works as expected:

print(0x20AC.toChar)               
€

So, it is not a bug in Scala, but an issue with default encodings. In my opinion, it would be better if by default UTF-8 was used on all platforms. In my search for an answer if this is considered, I came across a discussion on the Scala mailing list on this issue. In the first message, it is proposes to use UTF-8 by default on Mac OS X when file.encoding reports Macroman, since UTF-8 is the default charset on Mac OS X (keeps me wondering why file.encoding by defaults is set to Macroman, probably this is an inheritance from Mac OS before 10 was released?). I don't think this proposal will be part of Scala 2.8, since Martin Odersky wrote that it is probably best to keep things as they are in Java (i.e. honor the file.encoding property).