LLVM's integer types

Ali J picture Ali J · Feb 6, 2013 · Viewed 8.5k times · Source

The LLVM language specifies integer types as iN, where N is the bit-width of the integer, and ranges from 1 to 2^23-1 (According to: http://llvm.org/docs/LangRef.html#integer-type)

I have 2 questions:

  1. When compiling a C program down to LLVM IR level, what types may be lowered to i1, i2, i3, etc? It seems like the types i8, i16, i32, i64 must be enough, so I was wondering what all the other nearly 8 million integer types are for.

  2. Is it true that both signed and unsigned integer types are lowered to i32? What is the reason for that, and why does it not apply to something like 32-bit float (which is represented as f32 in LLVM)?

Answer

Oak picture Oak · Feb 6, 2013

First of all, be aware both arbitrary-sized integers and no distinction between signed and unsigned integers are modifications added to LLVM 2.0. Earlier versions had only a few integer types, with a signed/unsigned distinction.

Now, to your questions:

  1. LLVM, though designed with C/C++ in mind, is not specific to these languages. Having more possible integer types gives you more flexibility. You don't have to use these types, of course - and I'm guessing that, as you've mentioned, any C/C++ frontend to LLVM (i.e. Clang) would probably only generate i1, i8, i16, i32 and i64.

    Edit: apparently I'm mistaken and Clang does use some other integer types as well, see Jens's comment below.

  2. Yes, LLVM does not make a distinction between signed and unsigned integer type, so both will be lowered to i32. The operations on the unsigned integer, though, will be translated according to the original type; e.g. a division between unsigned integers will be udiv while between signed will be sdiv. Because integers are represented as two's complement, though, many operations (e.g. add) don't care about signed/unsigned and so only have a single version.

    As for why no distinction was made in LLVM between signed and unsigned, read the details on this enhancement request - in short, having both signed and unsigned versions led to a large IR bloat and was detrimental to some optimizations, so it was dropped.

    Finally, you ask about why no f32 - the answer is that I don't know, maybe it was deemed to be less useful than arbitrarily-sized integers. However, notice that f32 is not really descriptive - if you want arbitrary floating-point types you need to at least specify the size of the base number and the size of the exponent, something like f23e8 instead of float and f52e11 instead of double. That's a bit cumbersome if you ask me, though I guess float and double could have been made synonymous with those.