Encode String to Base36

Patrick Vogt picture Patrick Vogt · Jan 13, 2017 · Viewed 13.8k times · Source

Currently I am working at an algorithm to encode a normal string with each possible character to a Base36 string.

I have tried the following but it doesn't work.

public static String encode(String str) {
    return new BigInteger(str, 16).toString(36);
}

I guess it's because the string is not just a hex string. If I use the string "Hello22334!" In Base36, then I get a NumberFormatException.

My approach would be to convert each character to a number. Convert the numbers to the hexadecimal representation, and then convert the hexstring to Base36.

Is my approach okay or is there a simpler or better way?

Answer

Christoffer Hammarström picture Christoffer Hammarström · Jan 13, 2017

First you need to convert your string to a number, represented by a set of bytes. Which is what you use an encoding for. I highly recommend UTF-8.

Then you need to convert that number, set of bytes to a string, in base 36.

byte[] bytes = string.getBytes(StandardCharsets.UTF_8); 
String base36 = new BigInteger(1, bytes).toString(36);

To decode:

byte[] bytes = new Biginteger(base36, 36).toByteArray();
// Thanks to @Alok for pointing out the need to remove leading zeroes.
int zeroPrefixLength = zeroPrefixLength(bytes);
String string = new String(bytes, zeroPrefixLength, bytes.length-zeroPrefixLength, StandardCharsets.UTF_8));

private int zeroPrefixLength(final byte[] bytes) {
    for (int i = 0; i < bytes.length; i++) {
        if (bytes[i] != 0) {
            return i;
        }
    }
    return bytes.length;
}