Is there a way to compress a string into a smaller string with reversibility?

Adam Frank picture Adam Frank · Dec 26, 2018 · Viewed 9.2k times · Source

I am trying to transmit strings over the iridium network, and the costs of sending data is pretty large. I am wondering if there is a way to compress a large string, for example: {"packet":01,"reporting time":1500, "altitude":6500,"latitude":0,"longitude": 0,"ballast":34,"parachute":0}

into a much smaller string, like: f5fk43d2 . The process must be reversible, so that the data can be decoded and read on the other end. Is this possible, if so, how would I go about doing this.

I have tried this answer by j.w.r: Shortening a string in Java , however it seems irreversible. It does convert a large string into a smaller one.

The process must result in a string smaller than the original.

Any help is appreciated!

Answer

Saleh Mostafa picture Saleh Mostafa · Dec 26, 2018

Consider the mathematics of attempting to convert some X-character string to a Y-character string, such that X > Y (i.e. you're trying to shorten the length of the string).

Then, let's say that the string is alphanumeric; this gives us 26 possible lowercase letters, 26 possible uppercase letters, and 10 possible numbers that we can use (i.e. 62 possibilities). This means that for an X-character string, we would have 62^X possible strings, and for a Y-character string, we would have 62^Y possible strings.

Now, consider if we try to map all of our X-character strings to our Y-character strings. Let's let the function f(S) map a string S (an X-character string) to a Y-character string. Then, because X > Y, we will necessarily have to map some X-character strings to some of the same Y-character strings. Consider the following simple example:

X = 3. Y = 2. Then, we have 62^3 possible 3-character strings (238,000), and 62^2 (3800) possible Y-character strings. Then, we have 234,000 more 3-character strings than 2-character strings.

Now, imagine we tried to have some function f(S) where we tried to make every 3-character string into a 2-character string. Then, we'd naturally have an issue when we tried to convert a 2-character string back into a 3-character string, because this means that f(S) must convert some 3-character strings into the same string (so we couldn't know which one to map back to!). This is because the domain of 2-character strings is less than the domain of 3-character strings (and occurs because f(S) then cannot be injective, meaning there is no valid inverse).

Thus, there aren't enough 2-character strings to possibly map back to every 3-character string, and you'll find that this generalizes to all X > Y.

You could possibly restrict some characters from the domain of your larger strings, though exactly as you have stated the problem, this is not possible.

Edit, because I feel as though I should mention this: There are algorithms used to compress strings of lesser characters to smaller strings of more characters. With that being said, I'd recommend taking a look at this: An efficient compression algorithm for short text strings