The use of size_t in an array iterator

J Collins picture J Collins · Mar 28, 2014 · Viewed 7.4k times · Source

I have learned recently that size_t was introduced to help future-proof code against native bit count increases and increases in available memory. The specific use definition seems to be on the storing of the size of something, generally an array.

I now must wonder how far this future proofing should be taken. Surely it is pointless to have an array length defined using the future-proof and appropriately sized size_t if the very next task of iterating over the array uses say an unsigned int as the index array:

void (double* vector, size_t vectorLength) {
    for (unsigned int i = 0; i < vectorLength; i++) {
        //...
    }
}

In fact in this case I might expect the syntax strictly should up-convert the unsigned int to a size_t for the relation operator.

Does this imply the iterator variable i should simply be a size_t?

Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?

Does it imply any code using logic that develops the index programmatically should then create a new result value of type size_t, particularly if the logic relies on potentially signed integer values? i.e.

double foo[100];
//...
int a = 4;
int b = -10;
int c = 50;

int index = a + b + c;
double d = foo[(size_t)index];

Surely though since my code logic creates a fixed bound, up-converting to the size_t provides no additional protection.

Answer

Shahbaz picture Shahbaz · Mar 28, 2014

You should keep in mind the automatic conversion rules of the language.

Does this imply the iterator variable i should simply be a size_t?

Yes it does, because if size_t is larger than unsigned int and your array is actually larger than can be indexed with an unsigned int, then your variable (i) can never reach the size of the array.

Does this imply that any integer in any program must become functionally identified as to whether it will ever be used as an array index?

You try to make it sound drastic, while it's not. Why do you choose a variable as double and not float? Why would you make a variable as unsigned and one not? Why would you make a variable short while another is int? Of course, you always know what your variables are going to be used for, so you decide what types they should get. The choice of size_t is one among many and it's similarly decided.

In other words, every variable in a program should be functionally identified and given the correct type.

Does it imply any code using logic that develops the index programmatically should then create a new result value of type size_t, particularly if the logic relies on potentially signed integer values?

Not at all. First, if the variable can never have negative values, then it could have been unsigned int or size_t in the first place. Second, if the variable can have negative values during computation, then you should definitely make sure that in the end it's non-negative, because you shouldn't index an array with a negative number.

That said, if you are sure your index is non-negative, then casting it to size_t doesn't make any difference. C11 at 6.5.2.1 says (emphasis mine):

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2th element of E1 (counting from zero).

Which means whatever type of index for which some_pointer + index makes sense, is allowed to be used as index. In other words, if you know your int has enough space to contain the index you are computing, there is absolutely no need to cast it to a different type.