I'm writing a program for an assignment which has to implement LZW compression/decompression. I'm using the following algorithms for this:
-compression
w = NIL;
while ( read a character k )
{
if wk exists in the dictionary
w = wk;
else
add wk to the dictionary;
output the code for w;
w = k;
}
-decompression
read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
{
entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry;
}
For the compression stage I'm just outputing ints representing the index for the dictionary entry, also the starting dictionary consists of ascii characters (0 - 255). But when I come to the decompression stage I get this error for example if I compress a text file consisting of only "booop" it will go through these steps to produce an output file:
w k Dictionary Output
- b - -
b o bo (256) 98 (b)
o o oo (257) 111 (o)
o o - -
oo p oop (258) 257 (oo)
p - - 112 (p)
output.txt: 98 111 257 112
Then when I come to decompress the file
w k entry output Dictionary
98 (b) b
b 111 (o) o o bo (256)
o 257 (error)
257 (oo) hasn't been added yet. Can anyone see where I'm going wrong here cause I'm stumped. Is the algorithm wrong?
Your compression part is right and complete but the decompression part is not complete. You only include the case when the code is in the dictionary. Since the decompression process is always one step behind the compression process, there is the possibility when the decoder find a code which is not in the dictionary. But since it's only one step behind, it can figure out what the encoding process will add next and correctly output the decoded string, then add it to the dictionary. To continue your decompression process like this:
-decompression
read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
{
if k exists in the dictionary
entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry;
else
output entry = w + firstCharacterOf(w);
add entry to dictionary;
w = entry;
}
Then when you come to decompress the file and see 257, you find it's not there in the dictionary. But you know the previous entry is 'o' and it's first character is 'o' too, put them together, you get "oo". Now output oo and add it to dictionary. Next you get code 112 and sure you know it's p. DONE!
w k entry output Dictionary
98 (b) b
b 111 (o) o o bo (256)
o 257 (oo) oo oo(257)
oo 112(p) p
See: this explanation by Steve Blackstock for more information. A better page with flow chart for the actual decoder and encoder implementation on which the "icafe" Java image library GIF encoder and decoder are based.