How to generate a sse4.2 popcnt machine instruction

Alan Moskowitz picture Alan Moskowitz · Jun 21, 2011 · Viewed 11k times · Source

Using the c program:

int main(int argc , char** argv)
{

  return  __builtin_popcountll(0xf0f0f0f0f0f0f0f0);

}

and the compiler line (gcc 4.4 - Intel Xeon L3426):

gcc -msse4.2 poptest.c -o poptest

I do NOT get the builtin popcnt insruction rather the compiler generates a lookup table and computes the popcount that way. The resulting binary is over 8000 bytes. (Yuk!)

Thanks so much for any assistance.

Answer

Torkel Bjørnson-Langen picture Torkel Bjørnson-Langen · Nov 3, 2012

You have to tell GCC to generate code for an architecture that supports the popcnt instruction:

gcc -march=corei7 popcnt.c

Or just enable support for popcnt:

gcc -mpopcnt popcnt.c

In your example program the parameter to __builtin_popcountll is a constant so the compiler will probably do the calculation at compile time and never emit the popcnt instruction. GCC does this even if not asked to optimize the program.

So try passing it something that it can't know at compile time:

int main (int argc, char** argv)
{
    return  __builtin_popcountll ((long long) argv);
}

$ gcc -march=corei7 -O popcnt.c && objdump -d a.out | grep '<main>' -A 2
0000000000400454 <main>:
  400454:       f3 48 0f b8 c6          popcnt %rsi,%rax
  400459:       c3                      retq