What is the best compression algorithm that allows random reads/writes in a file?

Brian R. Bondy picture Brian R. Bondy · Oct 25, 2008 · Viewed 7.2k times · Source

What is the best compression algorithm that allows random reads/writes in a file?

I know that any adaptive compression algorithms would be out of the question.

And I know huffman encoding would be out of the question.

Does anyone have a better compression algorithm that would allow random reads/writes?

I think you could use any compression algorithm if you write it in blocks, but ideally I would not like to have to decompress a whole block at a time. But if you have suggestions on an easy way to do this and how to know the block boundaries, please let me know. If this is part of your solution, please also let me know what you do when the data you want to read is across a block boundary?

In the context of your answers please assume the file in question is 100GB, and sometimes I'll want to read the first 10 bytes, and sometimes I'll want to read the last 19 bytes, and sometimes I'll want to read 17 bytes in the middle. .

Answer

David Cary picture David Cary · Aug 8, 2010

I am stunned at the number of responses that imply that such a thing is impossible.

Have these people never heard of "compressed file systems", which have been around since before Microsoft was sued in 1993 by Stac Electronics over compressed file system technology?

I hear that LZS and LZJB are popular algorithms for people implementing compressed file systems, which necessarily require both random-access reads and random-access writes.

Perhaps the simplest and best thing to do is to turn on file system compression for that file, and let the OS deal with the details. But if you insist on handling it manually, perhaps you can pick up some tips by reading about NTFS transparent file compression.

Also check out: "StackOverflow: Compression formats with good support for random access within archives?"