Compression Program in C

Delandilon picture Delandilon · Nov 3, 2013 · Viewed 20.6k times · Source

I want to compress a series of characters. For example if i type

Input : FFFFFBBBBBBBCCBBBAABBGGGGGSSS (27 x 8 bits = 216 bits) Output: F5B7C2B3A2B2G5S3 (14 x 8 bits = 112bits)

So far this is what i have, i can count the number of Characters in the Array. But the most important task is to count them in the same sequence. I can't seem to figure that out :( Ive stared doing C just a few weeks back, i have knowledge on Array, pointers, ASCII value but in any case can't seem to count these characters in a sequence. Ive try a bit of everything. This approach is no good but it the closest i came to it.

#include <stdio.h>
#include <conio.h>

int main()
{

 int charcnt=0,dotcnt=0,commacnt=0,blankcnt=0,i, countA, countB;
 char str[125];



 printf("*****String Manipulations*****\n\n");
 printf("Enter a string\n\n");

 scanf("%[^'\n']s",str);


 printf("\n\nEntered String is \" %s \" \n",str);


 for(i=0;str[i]!='\0';i++)
 {

 // COUNTING EXCEPTION CHARS                         
 if(str[i]==' ')
    blankcnt++;

 if(str[i]=='.')
    dotcnt++;

 if(str[i]==',')
    commacnt++;

 if (str[i]=='A' || str[i]=='a')

  countA++;

      if (str[i]=='B' || str[i]=='b')

  countA++;

 }

 //PRINT RESULT OF COUNT
 charcnt=i;
 printf("\n\nTotal Characters : %d",charcnt);
 printf("\nTotal Blanks     : %d",blankcnt);
 printf("\nTotal Full stops : %d",dotcnt);
 printf("\nTotal Commas     : %d\n\n",commacnt);
 printf("A%d\n", countA);

 }

Answer

Darren Stone picture Darren Stone · Nov 3, 2013

What you're trying to do is called Run-Length Encoding.

I think the counting of overall characters and, specifically, of any particular character (e.g. dots, commas, spaces) is an unnecessary distraction if your goal is to simply run-length compress the string. So let's ignore that for now.

Here's how you can easily do run-length encoding of an ASCII string in place. i.e. the original string will be overwritten with the compressed string. This may or may not be what you want, but it saves allocation of another buffer and is easy to code.

char *compress(char *str) {
    char *start = str;
    char *c_first = str;
    char *c_last = str;
    char *c_write = str;
    int run_len = 0;
    while (*str) {
        ++c_last;
        ++run_len;
        if (!(*c_last) || *c_last != *c_first || run_len == 9) { 
            // end of run
            *(c_write++) = *c_first;
            if (run_len > 1)
                *(c_write++) = '0' + run_len;
            // start next run
            run_len = 0; 
            c_first = c_last;
        }
        ++str;
    }
    *c_write = 0;
    return start;
}

If the count or exclusion of any special characters along the way is necessary, you can do that easily within the while loop.

Add this to allow testing from the command-line. Run with your original string as the single argument.

int main(int argc, char **argv) {
    if (argc != 2)
        return 1;
    printf("%s\n", compress(argv[1]));
    return 0;
}

Your requirements for output are not fully specified, so my assumptions are:

  1. Optimizing assumption: Runs of length 1 are not compressed. This is easy to detect upon decompression and ensures the compressed string is never longer than the original. e.g. "ABBCDEF" compresses to "AB2CDEF" (instead of "A1B2C1D1E1F1")

  2. Simplifying assumption: Runs longer than 9 characters will be compressed into several pieces. This ensures a run length can always be expressed in a single ASCII digit. i.e. "AAAAAAAAAAAABBBB" compresses to "A9A3B4" If you need the output to be "A12B4", it is not difficult. Remove the run_len == 9 comparison and expand the code under run_len > 1 to use iota for string rendering.