The application keeps receiving objects named Report
and put the objects into Disruptor
for three different consumers.
With the help of Eclipse Memory Analysis, the Retained Heap Size of each Report
object is 20KB on average. The application starts with -Xmx2048
, indicating the heap size of the application is 2GB.
However, the number of the objects is around 100,000 at a time, which means that the total size of all the object is roughly 2GB.
The requirement is that all 100,000 objects should be loaded into Disruptor
so that the consumers would consume the data asynchronously. But it's not possible if the size of each object is as large as 20KB.
So I'd like to serialize the object to String
and compress it:
private static byte[] toBytes(Serializable o) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(o);
oos.close();
return baos.toByteArray();
}
private static String compress(byte[] str) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str);
gzip.close();
return new String(Base64Coder.encode(out.toByteArray()));
}
After compress(toBytes(Report))
, the object size is smaller:
Before compression
After compression
Right now the String of object is around 6KB. It's better now.
Here's my question:
Is there any other data format whose size is less than String?
Calling serialization and compression each time will create objects like ByteArrayOutputStream
, ObjectOutputStream
and so on. I don't want to create to many objects like ByteArrayOutputStream
, ObjectOutputStream
because I need to iterate 100,000 times.How to design the codes so that objects like ByteArrayOutputStream
, ObjectOutputStream
only create once and use it for each iteration?
Consumers need to deserialize and decompress the String from Disruptor
. If I have three consumers so I need to deserialize and decompress three times. Any way around?
Update:
As @BoristheSpider suggested, the serialization and compression should be perform in one action:
private static byte[] compressObj(Serializable o) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
GZIPOutputStream zos = new GZIPOutputStream(bos);
ObjectOutputStream ous = new ObjectOutputStream(zos);
ous.writeObject(o);
zos.finish();
bos.flush();
return bos.toByteArray();
}
Using ObjectOutputStream and compression is so much more expensive than using Disruptor it defeats the purpose of using it. It is likely to be 1000x more expensive.
You are far better off limiting how many objects you queue at once. Unless you have something seriously wrong with your design, having a queue of just 1000 20 KB objects should be more than enough to ensure all you consumers are working efficiently.
BTW if you need persistence, I would use Chronicle (partly because I wrote it) This doesn't need compression or byte[] or Strings for storage, persists all messages, your queue is unbounded and entirely off heap. i.e. your 100K objects will use << 1 MB of heap.