Reading and processing big text file of 25GB

user1142292 picture user1142292 · Jan 11, 2012 · Viewed 36.3k times · Source

I have to read a big text file of, say, 25 GB and need to process this file within 15-20 minutes. This file will have multiple header and footer section.

I tried CSplit to split this file based on header, but it is taking around 24 to 25 min to split it to a number of files based on header, which is not acceptable at all.

I tried sequential reading and writing by using BufferReader and BufferWiter along with FileReader and FileWriter. It is taking more than 27 min. Again, it is not acceptable.

I tried another approach like get the start index of each header and then run multiple threads to read file from specific location by using RandomAccessFile. But no luck on this.

How can I achieve my requirement?

Possible duplicate of:

Read large files in Java

Answer

xikkub picture xikkub · Jan 11, 2012

Try using a large buffer read size (for example, 20MB instead of 2MB) to process your data quicker. Also don't use a BufferedReader because of slow speeds and character conversions.

This question has been asked before: Read large files in Java