There is a strange restriction in java.io.DataOutputStream.writeUTF(String str)
method, which limits the size of an UTF-8 encoded string to 65535
bytes:
if (utflen > 65535)
throw new UTFDataFormatException(
"encoded string too long: " + utflen + " bytes");
It is strange, because:
static int writeUTF(String str, DataOutput out)
method of this classjava.io.DataInputStream.readUTF()
.According to the said above I can not understand the purpose of a such restriction in the writeUTF
method. What have I missed or misunderstood?
The Javadoc of DataOutputStream.writeUTF
states:
First, two bytes are written to the output stream as if by the
writeShort
method giving the number of bytes to follow. This value is the number of bytes actually written out, not the length of the string.
Two bytes means 16 bits: in 16 bits the maximum integer one can encode is 2^16
== 65535.
DataInputStream.readUTF
has the exact same restriction, because it first reads the number of UTF-8 bytes to consume, in the form of a 2-byte integer, which again can only have a maximum value of 65535.
writeUTF
first writes two bytes with the length, which has the same result as calling writeShort
with the length and then writing the UTF-encoded bytes. writeUTF
doesn't actually call writeShort
- it builds up a single byte[]
with both the 2-byte length and the UTF bytes. But that is why the Javadoc says "as if by the writeShort
method" rather than just "by the writeShort
method".