Fastest way to print a single line in a file

JBoy picture JBoy · Mar 26, 2013 · Viewed 28.5k times · Source

I have to fetch one specific line out of a big file (1500000 lines), multiple times in a loop over multiple files, I was asking my self what would be the best option (in terms of performance). There are many ways to do this, i manly use these 2

cat ${file} | head -1

or

cat ${file} | sed -n '1p'

I could not find an answer to this do they both only fetch the first line or one of the two (or both) first open the whole file and then fetch the row 1?

Answer

Chris Seymour picture Chris Seymour · Mar 26, 2013

Drop the useless use of cat and do:

$ sed -n '1{p;q}' file

This will quit the sed script after the line has been printed.


Benchmarking script:

#!/bin/bash

TIMEFORMAT='%3R'
n=25
heading=('head -1 file' 'sed -n 1p file' "sed -n '1{p;q} file" 'read line < file && echo $line')

# files upto a hundred million lines (if your on slow machine decrease!!)
for (( j=1; j<=100,000,000;j=j*10 ))
do
    echo "Lines in file: $j"
    # create file containing j lines
    seq 1 $j > file
    # initial read of file
    cat file > /dev/null

    for comm in {0..3}
    do
        avg=0
        echo
        echo ${heading[$comm]}    
        for (( i=1; i<=$n; i++ ))
        do
            case $comm in
                0)
                    t=$( { time head -1 file > /dev/null; } 2>&1);;
                1)
                    t=$( { time sed -n 1p file > /dev/null; } 2>&1);;
                2)
                    t=$( { time sed '1{p;q}' file > /dev/null; } 2>&1);;
                3)
                    t=$( { time read line < file && echo $line > /dev/null; } 2>&1);;
            esac
            avg=$avg+$t
        done
        echo "scale=3;($avg)/$n" | bc
    done
done

Just save as benchmark.sh and run bash benchmark.sh.

Results:

head -1 file
.001

sed -n 1p file
.048

sed -n '1{p;q} file
.002

read line < file && echo $line
0

**Results from file with 1,000,000 lines.*

So the times for sed -n 1p will grow linearly with the length of the file but the timing for the other variations will be constant (and negligible) as they all quit after reading the first line:

enter image description here

Note: timings are different from original post due to being on a faster Linux box.