Lookup table vs switch in C embedded software

Plouff picture Plouff · Mar 7, 2016 · Viewed 9.3k times · Source

In another thread, I was told that a switch may be better than a lookup table in terms of speed and compactness.

So I'd like to understand the differences between this:

Lookup table

static void func1(){}
static void func2(){}

typedef enum
{
    FUNC1,
    FUNC2,
    FUNC_COUNT
} state_e;

typedef void (*func_t)(void);

const func_t lookUpTable[FUNC_COUNT] =
{
    [FUNC1] = &func1,
    [FUNC2] = &func2
};

void fsm(state_e state)
{
    if (state < FUNC_COUNT) 
        lookUpTable[state]();
    else
        ;// Error handling
}

and this:

Switch

static void func1(){}
static void func2(){}

void fsm(int state)
{
    switch(state)
    {
        case FUNC1: func1(); break;
        case FUNC2: func2(); break;
        default:    ;// Error handling
    }
}

I thought that a lookup table was faster since compilers try to transform switch statements into jump tables when possible. Since this may be wrong, I'd like to know why!

Thanks for your help!

Answer

too honest for this site picture too honest for this site · Mar 7, 2016

As I was the original author of the comment, I have to add a very important issue you did not mention in your question. That is, the original was about an embedded system. Presuming this is a typical bare-metal system with integrated Flash, there are very important differences from a PC on which I will concentrate.

Such embedded systems typically have the following constraints.

  • no CPU cache.
  • Flash requires waitstates for higher (i.e. >ca. 32MHz) CPU clocks. The actual ratio depends on the die design, low power/high speed process, operating voltage, etc.
  • To hide waitstates, Flash has wider read-lines than the CPU-bus.
  • This only works well for linear code with instruction prefetch.
  • Data accesses disturb instruction prefetch or are stalled until it finished.
  • Flash might have an internal very small instruction cache.
  • If any at all, there is an even smaller data-cache.
  • The small caches result in more frequent trashing (replacing a previous entry before that has been used another time).

For e.g. the STM32F4xx a read takes 6 clocks at 150MHz/3.3V for 128 bits (4 words). So if a data-access is required, chances are good it adds more than 12 clocks delay for all data to be fetched (there are additional cycles involved).

Presuming compact state-codes, for the actual problem, this has the following effects on this architecture (Cortex-M4):

  • Lookup-table: Reading the function address is a data-access. With all implications mentioned above.
  • A switch otoh uses a special "table-lookup" instruction which uses code-space data right behind the instruction. So the first entries are possibly already prefetched. Other entries don't break the prefetch. Also the access is a code-acces, thus the data goes into the Flash's instruction cache.

Also note that the switch does not need functions, thus the compiler can fully optimise the code. This is not possible for a lookup table. At least code for function entry/exit is not required.


Due to the aforementioned and other factors, an estimate is hard to tell. It heavily depends on your platform and the code structure. But assuming the system given above, the switch is very likely faster (and clearer, btw.).