This code,
OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
out.write("A".getBytes());
And this,
OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
out.write("A".getBytes(StandardCharsets.UTF_8));
produce the same result(in my opinion), which is UTF-8 without BOM. However, Notepad++ is not showing any information about encoding. I'm expecting notepad++ to show here as Encode in UTF-8 without BOM
, but no encoding is being selected in the "Encoding" menu.
Now, this code write the file in UTF-8 with BOM encoding.
OutputStream out = new FileOutputStream(new File("C:/file/test.txt"));
byte[] bom = { (byte) 239, (byte) 187, (byte) 191 };
out.write(bom);
out.write("A".getBytes());
Notepad++ is also displaying the encoding type as Encode in UTF-8
.
Question: What is wrong with the first two codes which are suppose to write the file in UTF-8 without BOM? Is my Java code doing the right thing? If so, is there a problem with notepad++ trying to detect the encoding type?
Is notepad++ only guessing around?
"A" written using UTF-8 without a BOM produces exactly the same file as "A" written using ASCII or ISO-8859-* or any other ASCII-compatible encodings. That file contains a single byte with the decimal value 65.
Think of it this way:
"A".getBytes("UTF-8")
returns a new byte[] { 65 }
"A".getBytes("ISO-8859-1")
returns a new byte[] { 65 }
There's nothing in that file that suggests that UTF-8 needs to be used to decode it.
Try writing "Käsekuchen" or something else that's not encodable in ASCII and see if Notepad++ guesses the encoding correctly (because that's exactly what it does: it makes an educated guess, there's no metadata that tells it which encoding to use).