I have a large array with a range of integers that are mostly continuous, eg 1-100, 110-160, etc. All integers are positive.
What would be the best algorithm to compress this?
I tried the deflate algorithm but that gives me only 50% compression.
Note that the algorithm cannot be lossy.
All numbers are unique and progressively increasing.
Also if you can point me to the java implementation of such algorithm that would be great.
We have written recent research papers that survey the best schemes for this problem. Please see:
Daniel Lemire and Leonid Boytsov, Decoding billions of integers per second through vectorization,Software: Practice & Experience 45 (1), 2015. http://arxiv.org/abs/1209.2137
Daniel Lemire, Nathan Kurz, Leonid Boytsov, SIMD Compression and the Intersection of Sorted Integers, Software: Practice and Experience (to appear) http://arxiv.org/abs/1401.6399
They include an extensive experimental evaluation.
You can find a complete implementation of all techniques in C++11 online: https://github.com/lemire/FastPFor and https://github.com/lemire/SIMDCompressionAndIntersection
There are also C libraries: https://github.com/lemire/simdcomp and https://github.com/lemire/MaskedVByte
If you prefer Java, please see https://github.com/lemire/JavaFastPFOR