Bit Array to String and back to Bit Array

Gopikrishna S picture Gopikrishna S · Feb 3, 2013 · Viewed 10.8k times · Source

Possible Duplicate Converting byte array to string and back again in C#

I am using Huffman Coding for compression and decompression of some text from here

The code in there builds a huffman tree to use it for encoding and decoding. Everything works fine when I use the code directly.

For my situation, i need to get the compressed content, store it and decompress it when ever need.

The output from the encoder and the input to the decoder are BitArray.

When I tried convert this BitArray to String and back to BitArray and decode it using the following code, I get a weird answer.

Tree huffmanTree = new Tree();
huffmanTree.Build(input);

string input = Console.ReadLine();
BitArray encoded = huffmanTree.Encode(input);

// Print the bits
Console.Write("Encoded Bits: ");
foreach (bool bit in encoded)
{
    Console.Write((bit ? 1 : 0) + "");
}
Console.WriteLine();

// Convert the bit array to bytes
Byte[] e = new Byte[(encoded.Length / 8 + (encoded.Length % 8 == 0 ? 0 : 1))];
encoded.CopyTo(e, 0);

// Convert the bytes to string
string output = Encoding.UTF8.GetString(e);

// Convert string back to bytes
e = new Byte[d.Length];
e = Encoding.UTF8.GetBytes(d);

// Convert bytes back to bit array
BitArray todecode = new BitArray(e);

string decoded = huffmanTree.Decode(todecode);

Console.WriteLine("Decoded: " + decoded);

Console.ReadLine();

The Output of Original code from the tutorial is:

enter image description here

The Output of My Code is:

enter image description here

Where am I wrong friends? Help me, Thanks in advance.

Answer

usr picture usr · Feb 3, 2013

You cannot stuff arbitrary bytes into a string. That concept is just undefined. Conversions happen using Encoding.

string output = Encoding.UTF8.GetString(e);

e is just binary garbage at this point, it is not a UTF8 string. So calling UTF8 methods on it does not make sense.

Solution: Don't convert and back-convert to/from string. This does not round-trip. Why are you doing that in the first place? If you need a string use a round-trippable format like base-64 or base-85.