Text file with 0D 0D 0A line breaks

Anders Abel picture Anders Abel · Aug 9, 2011 · Viewed 126.2k times · Source

A customer is sending me a .csv file where the line breaks are made up of the sequence 0xD 0xD 0xA. As far as I know line breaks are either 0xA from Mac or Unix or 0xD 0xA from Windows.

Is the 0xD 0xD 0xA any known encoding? Is there any known sequence of savings that corrupts a file's line endings that causes this (I think the customer uses a Mac)?

The file doesn't start with any encoding markers, it starts with the text contents directly. The text is displayed correctly if opened with code page 1252.

Answer

BalusC picture BalusC · Aug 9, 2011

The CRCRLF is known as result of a Windows XP notepad word wrap bug.

For future reference, here's an extract of relevance from the linked blog:

When you press the Enter key on Windows computers, two characters are actually stored: a carriage return (CR) and a line feed (LF). The operating system always interprets the character sequence CR LF the same way as the Enter key: it moves to the next line. However when there are extra CR or LF characters on their own, this can sometimes cause problems.

There is a bug in the Windows XP version of Notepad that can cause extra CR characters to be stored in the display window. The bug happens in the following situation:

If you have the word wrap option turned on and the display window contains long lines that wrap around, then saving the file causes Notepad to insert the characters CR CR LF at each wrap point in the display window, but not in the saved file.

The CR CR LF characters can cause oddities if you copy and paste them into other programs. They also prevent Notepad from properly re-wrapping the lines if you resize the Notepad window.

You can remove the CR CR LF characters by turning off the word wrap feature, then turning it back on if desired. However, the cursor is repositioned at the beginning of the display window when you do this.