Canonical vs. non-canonical terminal input

titaniumdecoy picture titaniumdecoy · Dec 11, 2008 · Viewed 44.2k times · Source

I am studying for an exam and I am confused as to how canonical vs. non-canonical input/output works in Unix (e.g., curses). I understand that there is a buffer to which "line disciplines" are applied for canonical input. Does this mean that the buffer is bypassed for non-canonical input, or does it simply mean that no line disciplines are applied? How does this process differ for input and output operations?

In the curses programs I have worked with that demonstrate canonical input, the input typed by a user is automatically entered either after a certain number of characters have been typed or a certain amount of time has passed. Are either of these things considered "line disciplines" or is this something else entirely?

Answer

Jonathan Leffler picture Jonathan Leffler · Dec 11, 2008

For canonical input — think shell; actually, think good old-fashioned Bourne shell, since Bash and relatives have command-line editing. You type a line of input; if you make a mistake, you use the erase character (default is Backspace, usually; sometimes Delete) to erase the previous character. If you mess up completely, you can cancel the whole line with the line kill character (not completely standardized, often Control-X). On some systems, you get a word erase with Control-W. All this is canonical input. The entire line is gathered and edited up until the end of line character — Return — is pressed. Thereupon, the whole line is made available to waiting programs. Depending on the read() system calls that are outstanding, the whole line will be available to be read (by one or more calls to read()).

For non-canonical input — think vi or vim or whatever — you press a character, and it is immediately available to the program. You aren't held up until you hit return. The system does no editing of the characters; they are made available to the program as soon as they are typed. It is up to the program to interpret things appropriately. Now, vim does do a number of things that look a bit like canonical input. For example, backspace moves backwards, and in input mode erases what was there. But that's because vim chooses to make it behave like that.

Canonical and non-canonical output is a much less serious business. There are a few bits and pieces of difference, related to things like whether to echo carriage-return before line-feed, and whether to do delays (not necessary with electronics; important in the days when the output device might have been a 110-baud teletype). It can also do things like handle case-insensitive output devices — teletypes, again. Lower-case letters are output in caps, and upper-case letters as backslash and caps.

It used to be that if you typed all upper-case letters to the login prompt, then the login program would automatically convert to the mode where all caps were output with a backslash in front of each actual capital. I suspect that this is no longer done on electronic terminals.


In a comment, TitaniumDecoy asked:

So with non-canonical input, is the input buffer bypassed completely? Also, where do line disciplines come in?

With non-canonical input, the input buffer is still used; if there is no program with a read() call waiting for input from the terminal, the characters are held in the input buffer. What doesn't happen is any editing of the input buffer.

Line disciplines are things like the set of manipulations that the input editing does. So, one aspect of the line discipline is that the erase character erases a prior character in canonical input mode. If you have icase (input case-mapping) set, then upper-case characters are mapped to lower-case unless preceded by a backslash; that is a line discipline, I believe, or an aspect of a line discipline.


I forgot to mention that EOF processing (Control-D) is handled in canonical mode; it actually means 'make the accumulated input available to read()'; if there is no accumulated input (if you type Control-D at the beginning of a line), then the read() will return zero bytes, which is then interpreted as EOF by programs. Of course, you can merrily type more characters on the keyboard after that, and programs that ignore EOF (or run in non-canonical mode) will be quite happy.

Of course, in canonical mode, the characters typed at the keyboard are normally echoed to the screen; you can control whether that echoing occurs. However, this is somewhat tangential to canonical input; the normal editing occurs even when echo is off.

Similarly, the interrupt and quit signals are artefacts of canonical mode processing. So too are the job control signals such as Control-Z to suspend the current process and return to the shell. Likewise, flow control (Control-S, Control-Q to stop and start output) is provided by canonical mode.

Chapter 4 of Rochkind's Advanced Unix Programming, 2nd Edn covers terminal I/O and gives much of this information — and a whole lot more. Other UNIX programming books (at least, the good ones) will also cover it.