What are the valid characters for macro names?

An̲̳̳drew picture An̲̳̳drew · Dec 15, 2008 · Viewed 52.4k times · Source

Are C-style macro names subject to the same naming rules as identifiers? After a compiler upgrade, it is now emitting this warning for a legacy application:

warning #3649-D: white space is required between the macro name "CHAR_" and its replacement text
  #define         CHAR_&        38

This line of code is defining an ASCII value constant for an ampersand.

#define   DOL_SN        36
#define   PERCENT       37
#define   CHAR_&        38
#define   RT_SING       39
#define   LF_PAR        40

I assume that this definition (not actually referenced by any code, as far as I can tell) is buggy and should be changed to something like "CHAR_AMPERSAND"?

Answer

Adam Rosenfield picture Adam Rosenfield · Dec 15, 2008

Macro names should only consist of alphanumeric characters and underscores, i.e. 'a-z', 'A-Z', '0-9', and '_', and the first character should not be a digit. Some preprocessors also permit the dollar sign character '$', but you shouldn't use it; unfortunately I can't quote the C standard since I don't have a copy of it.

From the GCC documentation:

Preprocessing tokens fall into five broad classes: identifiers, preprocessing numbers, string literals, punctuators, and other. An identifier is the same as an identifier in C: any sequence of letters, digits, or underscores, which begins with a letter or underscore. Keywords of C have no significance to the preprocessor; they are ordinary identifiers. You can define a macro whose name is a keyword, for instance. The only identifier which can be considered a preprocessing keyword is defined. See Defined.

This is mostly true of other languages which use the C preprocessor. However, a few of the keywords of C++ are significant even in the preprocessor. See C++ Named Operators.

In the 1999 C standard, identifiers may contain letters which are not part of the “basic source character set”, at the implementation's discretion (such as accented Latin letters, Greek letters, or Chinese ideograms). This may be done with an extended character set, or the '\u' and '\U' escape sequences. The implementation of this feature in GCC is experimental; such characters are only accepted in the '\u' and '\U' forms and only if -fextended-identifiers is used.

As an extension, GCC treats '$' as a letter. This is for compatibility with some systems, such as VMS, where '$' is commonly used in system-defined function and object names. '$' is not a letter in strictly conforming mode, or if you specify the -$ option. See Invocation.