Java: StringBuffer to byte[] without toString

mmascosta picture mmascosta · Oct 20, 2013 · Viewed 11.1k times · Source

The title says it all. Is there any way to convert from StringBuilder to byte[] without using a String in the middle?

The problem is that I'm managing REALLY large strings (millions of chars), and then I have a cycle that adds a char in the end and obtains the byte[]. The process of converting the StringBuffer to String makes this cycle veryyyy very very slow.

Is there any way to accomplish this? Thanks in advance!

Answer

VGR picture VGR · Oct 20, 2013

As many have already suggested, you can use the CharBuffer class, but allocating a new CharBuffer would only make your problem worse.

Instead, you can directly wrap your StringBuilder in a CharBuffer, since StringBuilder implements CharSequence:

Charset charset = StandardCharsets.UTF_8;
CharsetEncoder encoder = charset.newEncoder();

// No allocation performed, just wraps the StringBuilder.
CharBuffer buffer = CharBuffer.wrap(stringBuilder);

ByteBuffer bytes = encoder.encode(buffer);

EDIT: Duarte correctly points out that the CharsetEncoder.encode method may return a buffer whose backing array is larger than the actual data—meaning, its capacity is larger than its limit. It is necessary either to read from the ByteBuffer itself, or to read a byte array out of the ByteBuffer that is guaranteed to be the right size. In the latter case, there's no avoiding having two copies of the bytes in memory, albeit briefly:

ByteBuffer byteBuffer = encoder.encode(buffer);

byte[] array;
int arrayLen = byteBuffer.limit();
if (arrayLen == byteBuffer.capacity()) {
    array = byteBuffer.array();
} else {
    // This will place two copies of the byte sequence in memory,
    // until byteBuffer gets garbage-collected (which should happen
    // pretty quickly once the reference to it is null'd).

    array = new byte[arrayLen];
    byteBuffer.get(array);
}

byteBuffer = null;