Using only CR as linebreak inside pre tag doesn't work

schnaader picture schnaader · May 6, 2011 · Viewed 9.2k times · Source

At work, we stumbled upon Bugzilla creating HTML output that led to lines much too long because the browser didn't break the lines. This was happening on Chrome, but not on Firefox 3.5, so we didn't really care. But Firefox 4 behaves just like Chrome, so we had to find another workaround.

An example is:

<html>
  <body>
    <pre>
      Lorem ipsum dolor sit amet, consetetur sadipscing elitr,&#013;sed diam nonumy eirmod tempor invidunt ut labore et&#013;dolore magna aliquyam erat, sed diam voluptua. At vero eos&#013;et accusam et justo duo dolores et ea rebum. Stet clita kasd&#013;gubergren, no sea takimata sanctus est Lorem ipsum dolor sit&#013;amet.&#013;
    </pre>
  </body>
</html>

The server is using only CR as a linebreak which is very uncommon and the usual alternatives (CR+LF, only LF) work correctly, so the right way to fix this is to tell the Bugzilla server to use one of these linebreak methods. Anyway, I'm curious why this doesn't work and ignoring the linebreaks seems to be the "correct" way for browsers.

Also, I found a strange local workaround for Chrome and FF 4 using a Greasemonkey script (modified version of this one):

var els = document.getElementsByTagName("*");
for(var i = 0, l = els.length; i < l; i++) {
  var el = els[i];
  el.innerHTML = el.innerHTML;
}

It seems this would've no effect on the page, but with this script, linebreaks suddenly are showing correctly.

So my questions are:

  1. Is the Chrome/FF 4 way the "correct" way to handle these kinds of linebreaks inside <pre>?
  2. Why is this Greasemonkey script working?

Answer

Bill Brasky picture Bill Brasky · May 6, 2011

Yes, the HTML RFC defines a line break as: http://www.w3.org/TR/html401/struct/text.html#line-breaks

A line break is defined to be a carriage return (&#x000D;), a line feed (&#x000A;), or a carriage return/line feed pair. All line breaks constitute white space.

However, a bare carriage return is extremely rare. I'm not surprised it doesn't work. But technically, I'd say that FF4 and Chrome are in the wrong.

Not sure why your greasemonkey script is working. My guess is that getting el.innerHTML is converting CR to CR-LF or LF.