How to clone an inputstream in java in minimal time

Classified picture Classified · Nov 9, 2012 · Viewed 15.8k times · Source

Can someone tell me how to clone an inputstream, taking as little creation time as possible? I need to clone an inputstream multiple times for multiple methods to process the IS. I've tried three ways and things don't work for one reason or another.

Method #1: Thanks to the stackoverflow community, I found the following link helpful and have incorporated the code snippet in my program.

How to clone an InputStream?

However, using this code can take up to one minute (for a 10MB file) to create the cloned inputstreams and my program needs to be as fast as possible.

    int read = 0;
    byte[] bytes = new byte[1024*1024*2];

    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    while ((read = is.read(bytes)) != -1)
        bos.write(bytes,0,read);
    byte[] ba = bos.toByteArray();

    InputStream is1 = new ByteArrayInputStream(ba);
    InputStream is2 = new ByteArrayInputStream(ba);
    InputStream is3 = new ByteArrayInputStream(ba);

Method #2: I also tried using BufferedInputStream to clone the IS. This was fast (slowest creation time == 1ms. fastest == 0ms). However, after I sent is1 to be processed, the methods processing is2 and is3 threw an error saying there was nothing to process, almost like all 3 variables below referenced the same IS.

    is = getFileFromBucket(path,filename);
    ...
    ...
    InputStream is1 = new BufferedInputStream(is);
    InputStream is2 = new BufferedInputStream(is);
    InputStream is3 = new BufferedInputStream(is);

Method #3: I think the compiler is lying to me. I checked markSupported() for is1 for the two examples above. It returned true so I thought I could run

    is1.mark() 
    is1.reset()

or just

    is1.reset();

before passing the IS to my respective methods. In both of the above examples, I get an error saying it's an invalid mark.

I'm out of ideas now so thanks in advance for any help you can give me.

P.S. From the comments I've received from people, I need to clarify a couple things regarding my situation: 1) This program is running on a VM 2) The inputstream is being passed into me from another method. I'm not reading from a local file 3) The size of the inputstream is not known

Answer

BalusC picture BalusC · Nov 9, 2012

how to clone an inputstream, taking as little creation time as possible? I need to clone an inputstream multiple times for multiple methods to process the IS

You could just create some kind of a custom ReusableInputStream class wherein you immediately also write to an internal ByteArrayOutputStream on the 1st full read, then wrap it in a ByteBuffer when the last byte is read and finally reuse the very same ByteBuffer on the subsequent full reads which get automatically flipped when limit is reached. This saves you from one full read as in your 1st attempt.

Here's a basic kickoff example:

public class ReusableInputStream extends InputStream {

    private InputStream input;
    private ByteArrayOutputStream output;
    private ByteBuffer buffer;

    public ReusableInputStream(InputStream input) throws IOException {
        this.input = input;
        this.output = new ByteArrayOutputStream(input.available()); // Note: it's resizable anyway.
    }

    @Override
    public int read() throws IOException {
        byte[] b = new byte[1];
        read(b, 0, 1);
        return b[0];
    }

    @Override
    public int read(byte[] bytes) throws IOException {
        return read(bytes, 0, bytes.length);
    }

    @Override
    public int read(byte[] bytes, int offset, int length) throws IOException {
        if (buffer == null) {
            int read = input.read(bytes, offset, length);

            if (read <= 0) {
                input.close();
                input = null;
                buffer = ByteBuffer.wrap(output.toByteArray());
                output = null;
                return -1;
            } else {
                output.write(bytes, offset, read);
                return read;
            }
        } else {
            int read = Math.min(length, buffer.remaining());

            if (read <= 0) {
                buffer.flip();
                return -1;
            } else {
                buffer.get(bytes, offset, read);
                return read;
            }
        }

    }

    // You might want to @Override flush(), close(), etc to delegate to input.
}

(note that the actual job is performed in int read(byte[], int, int) instead of in int read() and thus it's expected to be faster when the caller itself is also streaming using a byte[] buffer)

You could use it as follows:

InputStream input = new ReusableInputStream(getFileFromBucket(path,filename));
IOUtils.copy(input, new FileOutputStream("/copy1.ext"));
IOUtils.copy(input, new FileOutputStream("/copy2.ext"));
IOUtils.copy(input, new FileOutputStream("/copy3.ext"));

As to the performance, 1 minute per 10MB is more likely a hardware problem, not a software problem. My 7200rpm laptop harddisk does it in less than 1 second.