What are declarations and declarators and how are their types interpreted by the standard?

Joseph Mansfield picture Joseph Mansfield · Dec 10, 2012 · Viewed 7k times · Source

How exactly does the standard define that, for example, float (*(*(&e)[10])())[5] declares a variable of type "reference to array of 10 pointer to function of () returning pointer to array of 5 float"?

Inspired by discussion with @DanNissenbaum

Answer

Joseph Mansfield picture Joseph Mansfield · Dec 10, 2012

I refer to the C++11 standard in this post

Declarations

Declarations of the type we're concerned with are known as simple-declarations in the grammar of C++, which are of one of the following two forms (§7/1):

decl-specifier-seqopt init-declarator-listopt ;
attribute-specifier-seq decl-specifier-seqopt init-declarator-list ;

The attribute-specifier-seq is a sequence of attributes ([[something]]) and/or alignment specifiers (alignas(something)). Since these don't affect the type of the declaration, we can ignore them and the second of the above two forms.

Declaration specifiers

So the first part of our declaration, the decl-specifier-seq, is made up of declaration specifiers. These include some things that we can ignore, such as storage specifiers (static, extern, etc.), function specifiers (inline, etc.), the friend specifier, and so on. However, the one declaration specifier of interest to us is the type specifier, which may include simple type keywords (char, int, unsigned, etc.), names of user-defined types, cv-qualifiers (const or volatile), and others that we don't care about.

Example: So a simple example of a decl-specifier-seq which is just a sequence of type specifiers is const int. Another one could be unsigned int volatile.

You may think "Oh, so something like const volatile int int float const is also a decl-specifier-seq?" You'd be right that it fits the rules of the grammar, but the semantic rules disallow such a decl-specifier-seq. Only one type specifier is allowed, in fact, except for certain combinations (such as unsigned with int or const with anything except itself) and at least one non-cv-qualifier is required (§7.1.6/2-3).

Quick Quiz (you might need to reference the standard)

  1. Is const int const a valid declaration specifier sequence or not? If not, is it disallowed by the syntactic or semantic rules?

    Invalid by semantic rules! const cannot be combined with itself.

  2. Is unsigned const int a valid declaration specifier sequence or not? If not, is it disallowed by the syntactic or semantic rules?

    Valid! It doesn't matter that the const separates the unsigned from int.

  3. Is auto const a valid declaration specifier sequence or not? If not, is it disallowed by the syntactic or semantic rules?

    Valid! auto is a declaration specifier but changed category in C++11. Before it was a storage specifier (like static), but now it is a type specifier.

  4. Is int * const a valid declaration specifier sequence or not? If not, is it disallowed by the syntactic or semantic rules?

    Invalid by syntactic rules! While this may very well be the full type of a declaration, only the int is the declaration specifier sequence. The declaration specifiers only provides the base type, and not compound modifiers like pointers, references, arrays, etc.

Declarators

The second part of a simple-declaration is the init-declarator-list. It is a sequence of declarators separated by commas, each with an optional initializer (§8). Each declarator introduces a single variable or function into the program. The most simple form of declarator is just the name you're introducing - the declarator-id. The declaration int x, y = 5; has a declaration specifier sequence that is just int, followed by two declarators, x and y, the second of which has an initializer. We will, however, ignore initializers for the rest of this post.

A declarator can have a particularly complex syntax because this is the part of the declaration that allows you to specify whether the variable is a pointer, reference, array, function pointer, etc. Note that these are all part of the declarator and not the declaration as a whole. This is precisely the reason why int* x, y; does not declare two pointers - the asterisk * is part of the declarator of x and not part of the declarator of y. One important rule is that every declarator must have exactly one declarator-id - the name it is declaring. The rest of the rules about valid declarators are enforced once the type of the declaration is determined (we'll come to it later).

Example: A simple example of a declarator is *const p, which declares a const pointer to... something. The type it points to is given by the declaration specifiers in its declaration. A more terrifying example is the one given in the question, (*(*(&e)[10])())[5], which declares a reference to an array of function pointers that return pointers to... again, the final part of the type is actually given by the declaration specifiers.

You're unlikely to ever come across such horrible declarators but sometimes similar ones do appear. It's a useful skill to be able to read a declaration like the one in the question and is a skill that comes with practice. It is helpful to understand how the standard interprets the type of a declaration.

Quick Quiz (you might need to reference the standard)

  1. Which parts of int const unsigned* const array[50]; are the declaration specifiers and the declarator?

    Declaration specifiers: int const unsigned
    Declarator: * const array[50]

  2. Which parts of volatile char (*fp)(float const), &r = c; are the declaration specifiers and the declarators?

    Declaration specifiers: volatile char
    Declarator #1: (*fp)(float const)
    Declarator #2: &r

Declaration Types

Now we know that a declaration is made up of a declarator specifier sequence and a list of declarators, we can begin to think about how the type of a declaration is determined. For example, it might be obvious that int* p; defines p to be a "pointer to int", but for other types it's not so obvious.

A declaration with multiple declarators, let's say 2 declarators, is considered to be two declarations of particular identifiers. That is, int x, *y; is a declaration of identifier x, int x, and a declaration of identifier y, int *y.

Types are expressed in the standard as English-like sentences (such as "pointer to int"). The interpretation of a declaration's type in this English-like form is done in two parts. First, the type of the declaration specifier is determined. Second, a recursive procedure is applied to the declaration as a whole.

Declaration specifiers type

The type of a declaration specifier sequence is determined by Table 10 of the standard. It lists the types of the sequences given that they contain the corresponding specifiers in any order. So for example, any sequence that contains signed and char in any order, including char signed, has type "signed char". Any cv-qualifier that appears in the declaration specifier sequence is added to the front of the type. So char const signed has type "const signed char". This makes sure that regardless of what order you put the specifiers, the type will be the same.

Quick Quiz (you might need to reference the standard)

  1. What is the type of the declaration specifier sequence int long const unsigned?

    "const unsigned long int"

  2. What is the type of the declaration specifier sequence char volatile?

    "volatile char"

  3. What is the type of the declaration specifier sequence auto const?

    It depends! auto will be deduced from the initializer. If it is deduced to be int, for example, the type will be "const int".

Declaration type

Now that we have the type of the declaration specifier sequence, we can work out the type of an entire declaration of an identifier. This is done by applying a recursive procedure defined in §8.3. To explain this procedure, I'll use a running example. We'll work out the type of e in float const (*(*(&e)[10])())[5].

Step 1 The first step is to split the declaration into the form T D where T is the declaration specifier sequence and D is the declarator. So we get:

T = float const
D = (*(*(&e)[10])())[5]

The type of T is, of course, "const float", as we determined in the previous section. We then look for the subsection of §8.3 that matches the current form of D. You'll find that this is §8.3.4 Arrays, because it states that it applies to declarations of the form T D where D has the form:

D1 [ constant-expressionopt ] attribute-specifier-seqopt

Our D is indeed of that form where D1 is (*(*(&e)[10])()).

Now imagine a declaration T D1 (we've gotten rid of the [5]).

T D1 = const float (*(*(&e)[10])())

It's type is "<some stuff> T". This section states that the type of our identifier, e, is "<some stuff> array of 5 T", where <some stuff> is the same as in the type of the imaginary declaration. So to work out the remainder of the type, we need to work out the type of T D1.

This is the recursion! We recursively work out the type of an inner part of the declaration, stripping a bit of it off at every step.

Step 2 So, as before, we split our new declaration into the form T D:

T = const float
D = (*(*(&e)[10])())

This matches paragraph §8.3/6 where D is of the form ( D1 ). This case is simple, the type of T D is simply the type of T D1:

T D1 = const float *(*(&e)[10])()

Step 3 Let's call this T D now and split it up again:

T = const float
D = *(*(&e)[10])()

This matches §8.3.1 Pointers where D is of the form * D1. If T D1 has type "<some stuff> T", then T D has type "<some stuff> pointer to T". So now we need the type of T D1:

T D1 = const float (*(&e)[10])()

Step 4 We call it T D and split it up:

T = const float
D = (*(&e)[10])()

This matches §8.3.5 Functions where D is of the form D1 (). If T D1 has type "<some stuff> T", then T D has type "<some stuff> function of () returning T". So now we need the type of T D1:

T D1 = const float (*(&e)[10])

Step 5 We can apply the same rule we did for step 2, where the declarator is simply parenthesised to end up with:

T D1 = const float *(&e)[10]

Step 6 Of course, we split it up:

T = const float
D = *(&e)[10]

We match §8.3.1 Pointers again with D of the form * D1. If T D1 has type "<some stuff> T", then T D has type "<some stuff> pointer to T". So now we need the type of T D1:

T D1 = const float (&e)[10]

Step 7 Split it up:

T = const float
D = (&e)[10]

We match §8.3.4 Arrays again, with D of the form D1 [10]. If T D1 has type "<some stuff> T", then T D has type "<some stuff> array of 10 T". So what is T D1's type?

T D1 = const float (&e)

Step 8 Apply the parentheses step again:

T D1 = const float &e

Step 9 Split it up:

T = const float
D = &e

Now we match §8.3.2 References where D is of the form & D1. If T D1 has type "<some stuff> T", then T D has type "<some stuff> reference to T". So what is the type of T D1?

T D1 = const float e

Step 10 Well it's just "T" of course! There is no <some stuff> at this level. This is given by the base case rule in §8.3/5.

And we're done!

So now if we look at the type we determined at each step, substituting the <some stuff>s from each level below, we can determine the type of e in float const (*(*(&e)[10])())[5]:

<some stuff> array of 5 T
│          └──────────┐
<some stuff> pointer to T
│          └────────────────────────┐
<some stuff> function of () returning T
|          └──────────┐
<some stuff> pointer to T
|          └───────────┐
<some stuff> array of 10 T
|          └────────────┐
<some stuff> reference to T
|          |
<some stuff> T

If we combine this all together, what we get is:

reference to array of 10 pointer to function of () returning pointer to array of 5 const float

Nice! So that shows how the compiler deduces the type of a declaration. Remember that this is applied to each declaration of an identifier if there are multiple declarators. Try figuring out these:

Quick Quiz (you might need to reference the standard)

  1. What is the type of x in the declaration bool **(*x)[123];?

    "pointer to array of 123 pointer to pointer to bool"

  2. What are the types of y and z in the declaration int const signed *(*y)(int), &z = i;?

    y is a "pointer to function of (int) returning pointer to const signed int"
    z is a "reference to const signed int"

If anybody has any corrections, please let me know!