I'm looking for a method that encodes an string to shortest possible length and lets it be decodable (pure PHP, no SQL). I have working script but I'm unsatisfied with length of the encoded string.
SCENARIO:
Link to an image (depends on the file resolution I want to show to the user):
Encoded link (so the user can't guess how to get the larger image):
So, basicaly I'd like to encode only the search query part of the url:
The method I use right now will encode the above query string to:
The method I use is:
$raw_query_string = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$encoded_query_string = base64_encode(gzdeflate($raw_query_string));
$decoded_query_string = gzinflate(base64_decode($encoded_query_string));
How do I shorten the encoded result and still have the possibility to decode it using only PHP?
I suspect that you will need to think more about your method of hashing if you don't want it to be decodable by the user. The issue with base64
is that a base64 string looks like a base64 string. There's a good chance that someone that's savvy enough to be looking at your page source will probably recognise it too.
Part one:
a method that encodes an string to shortest possible length
If you're flexible on your URL vocab/characters, this will be a good starting place. Since gzip makes a lot of its gains using back references, there is little point as the string is so short.
Consider your example - you've only saved 2 bytes in the compression, which are lost again in base64 padding:
Non-gzipped: string(52) "aW1nPS9kaXIvZGlyL2hpLXJlcy1pbWcuanBnJnc9NzAwJmg9NTAw"
Gzipped: string(52) "y8xNt9VPySwC44xM3aLUYt3M3HS9rIJ0tXJbcwMDtQxbUwMDAA=="
If you reduce your vocab size, this will naturally allow you better compression. Let's say we remove some redundant information
Take a look at the functions:
function compress($input, $ascii_offset = 38){
$input = strtoupper($input);
$output = '';
//We can try for a 4:3 (8:6) compression (roughly), 24 bits for 4 chars
foreach(str_split($input, 4) as $chunk) {
$chunk = str_pad($chunk, 4, '=');
$int_24 = 0;
for($i=0; $i<4; $i++){
//Shift the output to the left 6 bits
$int_24 <<= 6;
//Add the next 6 bits
//Discard the leading ascii chars, i.e make
$int_24 |= (ord($chunk[$i]) - $ascii_offset) & 0b111111;
}
//Here we take the 4 sets of 6 apart in 3 sets of 8
for($i=0; $i<3; $i++) {
$output = pack('C', $int_24) . $output;
$int_24 >>= 8;
}
}
return $output;
}
And
function decompress($input, $ascii_offset = 38) {
$output = '';
foreach(str_split($input, 3) as $chunk) {
//Reassemble the 24 bit ints from 3 bytes
$int_24 = 0;
foreach(unpack('C*', $chunk) as $char) {
$int_24 <<= 8;
$int_24 |= $char & 0b11111111;
}
//Expand the 24 bits to 4 sets of 6, and take their character values
for($i = 0; $i < 4; $i++) {
$output = chr($ascii_offset + ($int_24 & 0b111111)) . $output;
$int_24 >>= 6;
}
}
//Make lowercase again and trim off the padding.
return strtolower(rtrim($output, '='));
}
What's going on there is basically a removal of redundant information, followed by the compression of 4 bytes into 3. This is achieved by effectively having a 6-bit subset of the ascii table. This window is moved so that the offset starts at useful characters and includes all the characters you're currently using.
With the offset I've used, you can use anything from ASCII 38 to 102. This gives you a resulting string of 30 bytes, that's a 9-byte (24%) compression! Unfortunately, you'll need to make it URL-safe (probably with base64), which brings it back up to 40 bytes.
I think at this point, you're pretty safe to assume that you've reached the "security through obscurity" level required to stop 99.9% of people. Let's continue though, to the second part of your question
so the user can't guess how to get the larger image
It's arguable that this is already solved with the above, but what you need to do is pass this through a secret on the server, preferably with php openssl. The following code shows the complete usage flow of functions above and the encryption:
$method = 'AES-256-CBC';
$secret = base64_decode('tvFD4Vl6Pu2CmqdKYOhIkEQ8ZO4XA4D8CLowBpLSCvA=');
$iv = base64_decode('AVoIW0Zs2YY2zFm5fazLfg==');
$input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
var_dump($input);
$compressed = compress($input);
var_dump($compressed);
$encrypted = openssl_encrypt($compressed, $method, $secret, false, $iv);
var_dump($encrypted);
$decrypted = openssl_decrypt($encrypted, $method, $secret, false, $iv);
var_dump($decrypted);
$decompressed = decompress($compressed);
var_dump($decompressed);
The output of this script is the following:
string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"
string(30) "<��(��tJ��@�xH��G&(�%��%��xW"
string(44) "xozYGselci9i70cTdmpvWkrYvGN9AmA7djc5eOcFoAM="
string(30) "<��(��tJ��@�xH��G&(�%��%��xW"
string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"
You'll see the whole cycle: compression > encryption > base64 encode/decode > decryption > decompression. The output of this would be as close as possible as you could really get, at near the shortest length you could get.
Everything aside, I feel obliged to conclude this with the fact that it is theoretical only, and this was a nice challenge to think about. There are definitely better ways to achieve your desired result - I'll be the first to admit that my solution is a little bit absurd!