Related: Why can't variable names start with numbers?
Is there a technical reason why spaces aren't allowed in variable names or is it down to convention?
For example, what's stopping us from doing something like this?:
average score = sum of scores / number of scores
The only issue that comes to mind is keywords, but one could simply restrict the use of them in a variable name, and the lexer would be able to distinguish between part of a variable and a keyword.
There’s no fundamental reason, apart from the decisions of language designers and a history of single-token identifiers. Some languages in fact do allow multi-token identifiers: MultiMedia Fusion’s expression language, some Mac spreadsheet/notebook software whose name escapes me, and I’m sure of others. There are several considerations that make the problem nontrivial, though.
Presuming the language is free-form, you need a canonical representation, so that an identifier like account name
is treated the same regardless of whitespace. A compiler would probably need to use some mangling convention to please a linker. Then you have to consider the effect of that on foreign exports—why C++ has the extern "C"
linkage specifier to disable mangling.
Keywords are an issue, as you have seen. Most C-family languages have a lexical class of keywords distinct from identifiers, which are not context-sensitive. You cannot name a variable class
in C++. This can be solved by disallowing keywords in multi-token identifiers:
if account age < 13 then child account = true;
Here, if
and then
cannot be part of an identifier, so there is no ambiguity with account age
and child account
. Alternatively, you can require punctuation everywhere:
if (account age < 13) {
child account = true;
}
The last option is to make keywords pervasively context-sensitive, leading to such monstrosities as:
IF IF = THEN THEN ELSE = THEN ELSE THEN = ELSE
The biggest issue is that juxtaposition is an extremely powerful syntactic construct, and you don’t want to occupy it lightly. Allowing multi-token identifiers prevents using juxtaposition for another purpose, such as function application or composition. Far better, I think, just to allow most nonwhitespace characters and thereby permit such identifiers as canonical-venomous-frobnicator
. Still plenty readable but with fewer opportunities for ambiguity.