String Hex Encoding and Decoding

Arpit Aggarwal picture Arpit Aggarwal · Apr 16, 2015 · Viewed 11.5k times · Source

I am converting a String from UTF-8 to CP1047 and then performing hex encoding on it, which works great. Next what I am doing is converting back, using decoding the hex String and displaying it on console in UTF-8 format. Problem is I am not getting the proper String what I passed to encoding method. Below is the piece of code I coded:

public class HexEncodeDecode {

    public static void main(String[] args) throws UnsupportedEncodingException,
            DecoderException {
        String reqMsg = "ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0    000123450041234";
        char[] hexed = getHex(reqMsg, "UTF-8", "Cp1047");

        System.out.println(hexed);

        System.out.println(getString(hexed));
    }

    public static char[] getHex(String source, String inputCharacterCoding,
            String outputCharacterCoding) throws UnsupportedEncodingException {
        return Hex.encodeHex(new String(source.getBytes(inputCharacterCoding),
                outputCharacterCoding).getBytes(), false);
    }

    public static String getString(char[] source) throws DecoderException,
            UnsupportedEncodingException {
        return new String(Hex.decodeHex(source), Charset.forName("UTF-8"));
    }
}

Output I am getting is :

    C3B1C3AB7CC290C291C295C290C290C290C290C291C295C290C298C290C290C3A41616C290C290C290C290C290C298C290C290C290C290C290C290C290C290C294C290C290C290C290C290C295C290C290C290C290C290C290C29016C291C295C291C29016C291C299C290C290C290C290C290C290C290C290C291C294C290C294C291C296C291C295C291C294C291C298C290C290C290C290C291C2941604C296C299C290C291C296C291C280C290C3A2C290C280C280C280C280C290C290C290C29116C293C294C295C290C290C294C29116C293C294
ñë|äâ

So, need help in printing the input String back.

Expected output would be:

C3B1C3AB7CC290C291C295C290C290C290C290C291C295C290C298C290C290C3A41616C290C290C290C290C290C298C290C290C290C290C290C290C290C290C294C290C290C290C290C290C295C290C290C290C290C290C290C29016C291C295C291C29016C291C299C290C290C290C290C290C290C290C290C291C294C290C294C291C296C291C295C291C294C291C298C290C290C290C290C291C2941604C296C299C290C291C296C291C280C290C3A2C290C280C280C280C280C290C290C290C29116C293C294C295C290C290C294C29116C293C294
ISO0150000150800C220000080000000040000050000000215102190000000014041615141800001427690161 0B0    000123450041234

Answer

fge picture fge · Apr 16, 2015
new String(source.getBytes(inputCharacterCoding), outputCharacterCoding)
    .getBytes()

This probably does not do what you think it does.

First things first: a String has no encoding. Repeat after me: a String has no encoding.

A String is simply a sequence of tokens which aim to represent characters. It just happens that for this purpose Java uses a sequence of chars. They could just as well be carrier pigeons.

UTF8, CP1047 and others are just character codings; two operations can be performed:

  • encoding: turn a stream of carrier pigeons (chars) into a stream of bytes;
  • decoding: turn a stream of bytes into a stream of carrier pigeons (chars).

Basically, your base assumption is wrong; you cannot associate an encoding with a String. Your real input should be a byte stream (more often than not a byte array) which you know is the result of a particular encoding (in your case, UTF-8), which you want to re-encode using another charset (in your case, CP1047).

The "secret" behing a real answer here would be the code of your Hex.encodeHex() method but you don't show it, so this is as good an answer that I can muster.