linefeed character reading in java

Ravi.Kumar picture Ravi.Kumar · Jul 13, 2012 · Viewed 10.6k times · Source

I am wondering that when I open a file in notepad. I see a continuous line without any carriage return/line feed.

I made a java program to read the file. When I split the data from file by using \n or System.getProperty("line.separator");. I see lots of lines.

I found in hex editor that file has '0A' for new line ( used in UNIX ) and it appears as a rectangle in Notepad.

Well, my question is that if it doesn't have '0D' and 'OA' ( used in Windows for carriage return and line feed ). How my java program is splitting the data into lines? It should not split it.

Anyone have any idea?

Answer

Marc-Christian Schulze picture Marc-Christian Schulze · Jul 13, 2012

Java internally works with Unicode.

The Unicode standard defines a large number of characters that conforming applications should recognize as line terminators:[3]
LF: Line Feed, U+000A
VT: Vertical Tab, U+000B
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029

(http://en.wikipedia.org/wiki/Newline) That's why it interprets \n as newline.