How to write a bitstream

user1530192 picture user1530192 · Jul 17, 2012 · Viewed 7.2k times · Source

I'm thinking about writing some data into a bit stream using C. There are two ways come in mind. One is to concatenate variable bit-length symbols into a contiguous bit sequence, but in this way my decoder will probably have a hard time separating those symbols from this continuous bit stream. Another way is to distribute same amount of bits for which symbol and in that way the decoder can easily recover the original data, but there may be a waste of bits since the symbols have different values which in turn cause many bits in the bit stream being zero(this waste bits I guess).

Any hint what I should do?

I'm new to programming. Any help will be appreciated.

Answer

Patrick Raphael picture Patrick Raphael · Jul 17, 2012

Sounds like your trying to do something similiar to a Huffman compression scheme? I would just go byte-by-byte (char)and keep track of the offset within the byte where I read off the last symbol.

Assuming none of your symbols would be bigger than char. It would look something like this:

struct bitstream {
   char *data;
   int data_size;           // size of 'data' array
   int last_bit_offset;     // last bit in the stream 

   int current_data_offset; // position in 'data', i.e. data[current_data_offset] is current reading/writing byte
   int current_bit_offset;  // which bit we are currently reading/writing
}

char decodeNextSymbol(bitstream *bs) {

}

int encodeNextSymbol(bitstream *bs, char symbol) {

}

The matching code for decodeNextSymbol and encodeNextSymbol would have to use the C bitwise operations ('&' (bitwise AND), and '|' (bitwise OR) for example. I would then come up with a list of all my symbols, starting with the shortest first, and do a while loop that matches the shortest symbol. For example, if one of your symbols is '101', then if the stream is '1011101', it would match the first '101' and would continue to match the rest of the stream '1101' You would also have to handle the case where your symbol values overflow from one byte to the next.