Java Charset problem on linux

Inv3r53 picture Inv3r53 · Jan 30, 2010 · Viewed 30.4k times · Source

problem: I have a string containing special characters which i convert to bytes and vice versa..the conversion works properly on windows but on linux the special character is not converted properly.the default charset on linux is UTF-8 as seen with Charset.defaultCharset.getdisplayName()

however if i run on linux with option -Dfile.encoding=ISO-8859-1 it works properly..

how to make it work using the UTF-8 default charset and not setting the -D option in unix environment.

edit: i use jdk1.6.13

edit:code snippet works with cs = "ISO-8859-1"; or cs="UTF-8"; on win but not in linux

        String x = "½";
        System.out.println(x);
        byte[] ba = x.getBytes(Charset.forName(cs));
        for (byte b : ba) {
            System.out.println(b);
        }
        String y = new String(ba, Charset.forName(cs));
        System.out.println(y);

~regards daed

Answer

McDowell picture McDowell · Jan 30, 2010

Your characters are probably being corrupted by the compilation process and you're ending up with junk data in your class file.

if i run on linux with option -Dfile.encoding=ISO-8859-1 it works properly..

The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.

In short, don't use -Dfile.encoding=...

    String x = "½";

Since U+00bd (½) will be represented by different values in different encodings:

windows-1252     BD
UTF-8            C2 BD
ISO-8859-1       BD

...you need to tell your compiler what encoding your source file is encoded as:

javac -encoding ISO-8859-1 Foo.java

Now we get to this one:

    System.out.println(x);

As a PrintStream, this will encode data to the system encoding prior to emitting the byte data. Like this:

 System.out.write(x.getBytes(Charset.defaultCharset()));

That may or may not work as you expect on some platforms - the byte encoding must match the encoding the console is expecting for the characters to show up correctly.