What exactly is the linear whitespace? (LWS/LWSP)

Reflection picture Reflection · Jan 12, 2014 · Viewed 7.9k times · Source

I saw mention of the term, along with CRLF, CR, LF, CTL (control characters) and SP (space).

If it's not the regular inline whitespace ( ), so what character(s) is it?

Answer

some picture some · Jan 12, 2014

From STD68 Augmented BNF for Syntax Specifications: ABNF

LWSP    =  *(WSP / CRLF WSP)  ; Use of this linear-white-space rule permits
                              ; lines containing only white space*
WSP     =  SP / HTAB          ; white space
CRLF    =  CR LF              ; Internet standard newline
SP      =  %x20               ; space
HTAB    =  %x09               ; horizontal tab
CR      =  %x0D               ; carriage return
LF      =  %x0A               ; linefeed

The comment on LWSP has changed in STD68 (aka RFC5234) from RFC2234 and RFC4234 and now gives advice against using this definition in new documents.

In plain English: Linear white space is: any number of spaces or horizontal tabs, and also newline (CRLF) if it is followed by at least one space or horizontal tab.

Example of strings that are linear white space:

  • [SP]
  • [HTAB]
  • [SP][SP]
  • [HTAB][HTAB]
  • [SP][HTAB][SP]
  • [SP][CR][LF][SP]
  • [CR][LF][SP][CR][LF][SP][CR][LF][HTAB]
  • [SP][CR][LF][CR][LF][SP][CR][LF][SP][CR][LF] This is TWO linear white spaces: [SP] and [SP][CR][LF][SP]. [CR][LF] is only included if it has [SP] or [HTAB] before and after.
  • [SP][VTAB][SP] Two linear white spaces: [SP] and [SP], separated by a Vertical tab.
  • [SP][CR][LF][CR][LF][CR][LF] Only the first [SP] is a linear white space. CRLF followed by CRLF is not part of specification

Thanks to Jukka K. Korpela for reminding me to check for obsoleted RFCs and to unwind for clarification that CRLF must be followed by a space or htab to be part of LWSP.