Fastest way for line-by-line reading STDIN?

Maarten Kesselaers picture Maarten Kesselaers · Jan 25, 2012 · Viewed 48.9k times · Source

I'm looking for the most time-efficient way to read STDIN line-by-line.

The first line is the number of conditions to test. All the following lines are conditions (strings) with a maximum of 100 000 characters.

I have already tried the following (plus result for 4 times 90 000 characters:

  • Scanner with a while-loop (7255 ms)

    Scanner sc = new Scanner(System.in);
    int numberOfLines = Integer.parseInt(sc.nextLine());
    long start = 0;
    int i = 1;
    while (i<=numberOfLines){
        start = System.currentTimeMillis();
        sc.nextLine();
        Debug.println((System.currentTimeMillis()-start) + "ms for scanner while");
        i++;
    }
    
    • Results :
      1. 3228ms for scanner while
      2. 2264ms for scanner while
      3. 1309ms for scanner while
      4. 454ms for scanner while
  • Scanner with a for-loop (7078 ms)

    Scanner sc = new Scanner(System.in);
    int numberOfLines = Integer.parseInt(sc.nextLine());
    long start = 0;
    for (int i = 1; i<= numberOfLines;i++){
        start = System.currentTimeMillis();
        sc.nextLine();
        Debug.println((System.currentTimeMillis()-start) + "ms for scanner for");
        //i++;     
    }
    
    • Results :
      1. 3168ms for scanner for
      2. 2207ms for scanner for
      3. 1236ms for scanner for
      4. 467ms for scanner for
  • BufferedReader with a for-loop (7403 ms)

    try {
    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
    
    int numberOfLines = Integer.parseInt(br.readLine());
    long start = 0;
    for (int i = 0; i< numberOfLines;i++){
        start = System.currentTimeMillis();
        br.readLine();
        Debug.println((System.currentTimeMillis()-start) + "ms for bufferreader for");
        //i++;
    }
     } catch (Exception e) {
    System.err.println("Error:" + e.getMessage());
    

    }

    • Results :
      1. 3273ms for bufferreader for
      2. 2330ms for bufferreader for
      3. 1293ms for bufferreader for
      4. 507ms for bufferreader for
  • BufferedReader with a while-loop (7461 ms)

    try {
    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
    
    int numberOfLines = Integer.parseInt(br.readLine());
    int i=0;
    long start = 0;
    while(i< numberOfLines){
        start = System.currentTimeMillis();
        br.readLine();
        Debug.println((System.currentTimeMillis()-start) + "ms for bufferreader while");
        i++;
    }
     } catch (Exception e) {
    System.err.println("Error:" + e.getMessage());
    

    }

    • Results :
      1. 3296ms for bufferreader while
      2. 2358ms for bufferreader while
      3. 1307ms for bufferreader while
      4. 500ms for bufferreader while

While debugging the time taken, i noticed that the time-taken decreases after each read. Is it possible to restrict the bytes that are initialized (f.e. : If you have a maximum of 100.000 chars, limit the scanner/bufferedreader to only initialize 100 000 chars. After a read it will need to refill itself with the next 100 000 chars)

Any ideas on this matter are more than welcome.

EDIT : Added the code for each scenario along with the time-taken per line read. Also changed 100.000 to 100 000 to read more easily.

Answer

Andrei LED picture Andrei LED · Jan 25, 2012

Looked inside BufferedReader#readLine source. There're several problems I see:

  1. It uses StringBuffer instead of StringBuilder, which creates synchronization overhead.
  2. Also there seems to be data copy overhead - not entirely sure, better check it out.
  3. Dedicated monitor object in the BufferedReader and even more synchronization overhead.

You may take your chances with two things:

  1. Writing your own buffering, which could save some time on double copying of the data.
  2. Writing your own nextLine method that would use StringBuilder and go over source data with simple cycle.