I used RandomAccessFile
to read a byte
from a text file.
public static void readFile(RandomAccessFile fr) {
byte[] cbuff = new byte[1];
fr.read(cbuff,0,1);
System.out.println(new String(cbuff));
}
Why am I seeing one full character being read by this?
A char
represents a character in Java (*). It is 2 bytes large (at least that's what the valid value range suggests).
That doesn't necessarily mean that every representation of a character is 2 bytes long. In fact many encodings only reserve 1 byte for every character (or use 1 byte for the most common characters).
When you call the String(byte[])
constructor you ask Java to convert the byte[]
to a String
using the platform default encoding. Since the platform default encoding is usually a 1-byte encoding such as ISO-8859-1 or a variable-length encoding such as UTF-8, it can easily convert that 1 byte to a single character.
If you run that code on a platform that uses UTF-16 (or UTF-32 or UCS-2 or UCS-4 or ...) as the platform default encoding, then you will not get a valid result (you'll get a String
containing the Unicode Replacement Character instead).
That's one of the reasons why you should not depend on the platform default encoding: when converting between byte[]
and char[]
/String
or between InputStream
and Reader
or between OutputStream
and Writer
, you should always specify which encoding you want to use. If you don't, then your code will be platform-dependent.
(*) that's not entirely true: a char
represents a UTF-16 codepoint. Either one or two UTF-16 codepoints represent a Unicode codepoint. A Unicode codepoint usually represents a character, but sometimes multiple Unicode codepoints are used to make up a single character. But the approximation above is close enough to discuss the topic at hand.