I need to scan for a 16 bit word in a bit stream. It is not guaranteed to be aligned on byte or word boundaries.
What is the fastest way of achieving this? There are various brute force methods; using tables and/or shifts but are there any "bit twiddling shortcuts" that can cut down the number of calculations by giving yes/no/maybe contains the flag results for each byte or word as it arrives?
C code, intrinsics, x86 machine code would all be interesting.
Using simple brute force is sometimes good.
I think precalc all shifted values of the word and put them in 16 ints
so you got an array like this (assuming int
is twice as wide as short
)
unsigned short pattern = 1234;
unsigned int preShifts[16];
unsigned int masks[16];
int i;
for(i=0; i<16; i++)
{
preShifts[i] = (unsigned int)(pattern<<i); //gets promoted to int
masks[i] = (unsigned int) (0xffff<<i);
}
and then for every unsigned short you get out of the stream, make an int of that short and the previous short and compare that unsigned int to the 16 unsigned int's. If any of them match, you got one.
So basically like this:
int numMatch(unsigned short curWord, unsigned short prevWord)
{
int numHits = 0;
int combinedWords = (prevWord<<16) + curWord;
int i=0;
for(i=0; i<16; i++)
{
if((combinedWords & masks[i]) == preShifsts[i]) numHits++;
}
return numHits;
}
Do note that this could potentially mean multiple hits when the patterns is detected more than once on the same bits:
e.g. 32 bits of 0's and the pattern you want to detect is 16 0's, then it would mean the pattern is detected 16 times!
The time cost of this, assuming it compiles approximately as written, is 16 checks per input word. Per input bit, this does one &
and ==
, and branch or other conditional increment. And also a table lookup for the mask for every bit.
The table lookup is unnecessary; by instead right-shifting combined
we get significantly more efficient asm, as shown in another answer which also shows how to vectorize this with SIMD on x86.