Bit packing of array of integers

pajton picture pajton · Mar 7, 2010 · Viewed 15.4k times · Source

I have an array of integers, lets assume they are of type int64_t. Now, I know that only every first n bits of every integer are meaningful (that is, I know that they are limited by some bounds).

What is the most efficient way to convert the array in the way that all unnecessary space is removed (i.e. I have the first integer at a[0], the second one at a[0] + n bits and so on) ?

I would like it to be general as much as possible, because n would vary from time to time, though I guess there might be smart optimizations for specific n like powers of 2 or sth.

Of course I know that I can just iterate value over value, I just want to ask you StackOverflowers if you can think of some more clever way.

Edit:

This question is not about compressing the array to take as least space as possible. I just need to "cut" n bits from every integer and given the array I know the exact n of bits I can safely cut.

Answer

Gregory Pakosz picture Gregory Pakosz · Aug 4, 2013

Today I released: PackedArray: Packing Unsigned Integers Tightly (github project).

It implements a random access container where items are packed at the bit-level. In other words, it acts as if you were able to manipulate a e.g. uint9_t or uint17_t array:

PackedArray principle:
  . compact storage of <= 32 bits items
  . items are tightly packed into a buffer of uint32_t integers

PackedArray requirements:
  . you must know in advance how many bits are needed to hold a single item
  . you must know in advance how many items you want to store
  . when packing, behavior is undefined if items have more than bitsPerItem bits

PackedArray general in memory representation:
  |-------------------------------------------------- - - -
  |       b0       |       b1       |       b2       |
  |-------------------------------------------------- - - -
  | i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
  |-------------------------------------------------- - - -

  . items are tightly packed together
  . several items end up inside the same buffer cell, e.g. i0, i1, i2
  . some items span two buffer cells, e.g. i3, i6