Why can't variable names have spaces in them?

Volatility picture Volatility · Dec 25, 2013 · Viewed 9.1k times · Source

Related: Why can't variable names start with numbers?

Is there a technical reason why spaces aren't allowed in variable names or is it down to convention?

For example, what's stopping us from doing something like this?:

average score = sum of scores / number of scores

The only issue that comes to mind is keywords, but one could simply restrict the use of them in a variable name, and the lexer would be able to distinguish between part of a variable and a keyword.

Answer

Jon Purdy picture Jon Purdy · Dec 25, 2013

There’s no fundamental reason, apart from the decisions of language designers and a history of single-token identifiers. Some languages in fact do allow multi-token identifiers: MultiMedia Fusion’s expression language, some Mac spreadsheet/notebook software whose name escapes me, and I’m sure of others. There are several considerations that make the problem nontrivial, though.

Presuming the language is free-form, you need a canonical representation, so that an identifier like account name is treated the same regardless of whitespace. A compiler would probably need to use some mangling convention to please a linker. Then you have to consider the effect of that on foreign exports—why C++ has the extern "C" linkage specifier to disable mangling.

Keywords are an issue, as you have seen. Most C-family languages have a lexical class of keywords distinct from identifiers, which are not context-sensitive. You cannot name a variable class in C++. This can be solved by disallowing keywords in multi-token identifiers:

if account age < 13 then child account = true;

Here, if and then cannot be part of an identifier, so there is no ambiguity with account age and child account. Alternatively, you can require punctuation everywhere:

if (account age < 13) {
  child account = true;
}

The last option is to make keywords pervasively context-sensitive, leading to such monstrosities as:

IF IF = THEN THEN ELSE = THEN ELSE THEN = ELSE

The biggest issue is that juxtaposition is an extremely powerful syntactic construct, and you don’t want to occupy it lightly. Allowing multi-token identifiers prevents using juxtaposition for another purpose, such as function application or composition. Far better, I think, just to allow most nonwhitespace characters and thereby permit such identifiers as canonical-venomous-frobnicator. Still plenty readable but with fewer opportunities for ambiguity.