How to deal with UTF-16LE encoded text file using Java? or convert it to ASCII?

Bhushan picture Bhushan · May 31, 2011 · Viewed 9.9k times · Source

I am sorry if it has been asked before. I am trying to process a text file using Java. The text file is exported from MS SQLServer. When I open it in PSPad (sort of text editor in which I can view any file in hex format), it tells me that my text file is in UTF-16LE. Since I am getting it from someone else, it is quite possible.

Now my Java program is not able to deal with that format. So I wanted to know if there is any way by which I can either convert my text file in ASCII format or do some preprocessing or anything? I CAN modify the file.

Any help is greatly appreciated.

Thanks.

EDIT 1

I wrote this program, but it is not working as expected. If I see the output file in PSPad, I can see each character as a 2-byte char, e.g. '2' is 3200 instead of just 32; 'M' is 4D00 instead of just 4D, etc. The though says the encoding of output file is UTF-8. I am kind of confused here. Can anyone tell me what am I doing wrong?

public static void main(String[] args) throws Exception {

        try {
            // Open the file that is the first
            // command line parameter
            FileInputStream fstream = new FileInputStream(
                    "input.txt");
            // Get the object of DataInputStream
            DataInputStream in = new DataInputStream(fstream);
            BufferedReader br = new BufferedReader(new InputStreamReader(in,"UTF-16LE"));
            String strLine;
            // Read File Line By Line
            while ((strLine = br.readLine()) != null) {
                // Write to the file
                writeToFile(strLine);
            }
            // Close the input stream
            in.close();
        } catch (Exception e) {// Catch exception if any
            System.err.println("Error: " + e.getMessage());
        }

        System.out.println("done.");
    }

    static public void writeToFile(String str) {
        try {
            OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream("output.txt", true), "UTF-8");
            BufferedWriter fbw = new BufferedWriter(writer);
            fbw.write(str);
            fbw.close();
        } catch (Exception e) {// Catch exception if any
            System.err.println("Error: " + e.getMessage());
        }
    } 

EDIT 2

Here are the snapshots:

input file in PSPad (a free hex viewer)enter image description here

output file in PSPad enter image description here

this is what i was expecting to see: enter image description here

Answer

bmargulies picture bmargulies · May 31, 2011

Create an InputStreamReader for charset UTF-16LE and you will be all set.