Grayscale image compression using Huffman Coding in MATLAB

parvathy picture parvathy · Sep 22, 2014 · Viewed 12k times · Source

I am trying to compress a grayscale image using Huffman coding in MATLAB, and have tried the following code.

I have used a grayscale image with size 512x512 in tif format. My problem is that the size of the compressed image (length of the compressed codeword) is getting bigger than the size of the uncompressed image. The compression ratio is getting less than 1.

clc;
clear all;
A1 = imread('fig1.tif');
[M N]=size(A1);
A = A1(:);
count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');                  
    p(i)=count(i)/total;
end

[dict,avglen]=huffmandict(count,p) % build the Huffman dictionary
comp= huffmanenco(A,dict);         %encode your original image with the dictionary you just built
compression_ratio= (512*512*8)/length(comp)   %computing the compression ratio

%% DECODING
Im = huffmandeco(comp,dict); % Decode the code
I11=uint8(Im);

decomp=reshape(I11,M,N);
imshow(decomp);

Answer

rayryeng picture rayryeng · Sep 22, 2014

There is a slight error in your code. I'm assuming you want to calculate the probability of encountering each pixel, which is the normalized histogram. You're not computing it properly. Specifically:

count = [0:1:255]; % Distinct data symbols appearing in sig
total=sum(count);
for i=1:1:size((count)');                  
    p(i)=count(i)/total;
end

total is summing over [0,255] which is not correct. You're supposed to compute the probability distribution of your image. You should use imhist for that instead. As such, you should do this instead:

count = 0:255;
p = imhist(A1) / numel(A1);

This will correctly calculate your probability distribution for your image. Remember, when you're doing Huffman coding, you need to specify the probability of encountering a pixel. Assuming that each pixel can equally be likely to be chosen, this is captured by calculating the image's histogram, then normalizing by the total number of pixels in your image. Try that and see if you get any better results.


However, Huffman will only give you good compression ratios if you have frequently occurring symbols. Did you happen to take a look at the histogram or the spread of your pixels in your image?

If the spread is quite large, with very few entries per bin, then Huffman will not give you any compression savings. In fact it may give you a larger size as a result. Bear in mind that the TIFF compression standard only uses Huffman as part of the algorithm. There is also some pre- and post-processing done to further drive down the size.

As a further example, suppose I had an image that consisted of [0, 1, 2, ... 255; 0, 1, 2, ..., 255; 0, 1, 2, ..., 255]; I have 3 rows of [0,255], but really it could be any number of rows. This means that the probability of encountering each symbol is equiprobable, or 1/255, which means that for each symbol, we would need 8 bits per symbol... which is essentially the raw pixel value anyway!

The key behind Huffman is that a group of bits together generate one symbol. Frequently occurring symbols get assigned a smaller sequence of bits. Because this particular image that I talked about has intensities that are equiprobable, then you'd only generate one symbol per intensity rather than a group. With this, not only will you transmit the dictionary, you would effectively be sending one character at a time, and this is no better than sending the raw byte stream.

If you want your image to be compressed by raw Huffman, the distribution of pixels has to be skewed. For example, if most of the intensities in your image are dark, or are bright. If your image has good contrast or if the spread of the pixel intensities is flat throughout the image, then Huffman will not give you any compression savings.