Fastest way to incrementally read a large file

James Raitsev picture James Raitsev · Jan 28, 2012 · Viewed 37.5k times · Source

When given a buffer of MAX_BUFFER_SIZE, and a file that far exceeds it, how can one:

  1. Read the file in blocks of MAX_BUFFER_SIZE?
  2. Do it as fast as possible

I tried using NIO

    RandomAccessFile aFile = new RandomAccessFile(fileName, "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buffer = ByteBuffer.allocate(CAPARICY);

    int bytesRead = inChannel.read(buffer);

    buffer.flip();

        while (buffer.hasRemaining()) {
            buffer.get();
        }

        buffer.clear();
        bytesRead = inChannel.read(buffer);

    aFile.close();

And regular IO

    InputStream in = new FileInputStream(fileName);

    long length = fileName.length();

    if (length > Integer.MAX_VALUE) {
        throw new IOException("File is too large!");
    }

    byte[] bytes = new byte[(int) length];

    int offset = 0;

    int numRead = 0;

    while (offset < bytes.length
            && (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
        offset += numRead;
    }

    if (offset < bytes.length) {
        throw new IOException("Could not completely read file " + fileName);
    }

    in.close();

Turns out that regular IO is about 100 times faster in doing the same thing as NIO. Am i missing something? Is this expected? Is there a faster way to read the file in buffer chunks?

Ultimately i am working with a large file i don't have memory for to read it all at once. Instead, I'd like to read it incrementally in blocks that would then be used for processing.

Answer

Peter Lawrey picture Peter Lawrey · Jan 29, 2012

If you want to make your first example faster

FileChannel inChannel = new FileInputStream(fileName).getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(CAPACITY);

while(inChannel.read(buffer) > 0)
    buffer.clear(); // do something with the data and clear/compact it.

inChannel.close();

If you want it to be even faster.

FileChannel inChannel = new RandomAccessFile(fileName, "r").getChannel();
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
// access the buffer as you wish.
inChannel.close();

This can take 10 - 20 micro-seconds for files up to 2 GB in size.