I need to repeatedly remove the first line from a huge text file using a bash script.
Right now I am using sed -i -e "1d" $FILE
- but it takes around a minute to do the deletion.
Is there a more efficient way to accomplish this?
Try tail:
tail -n +2 "$FILE"
-n x
: Just print the last x
lines. tail -n 5
would give you the last 5 lines of the input. The +
sign kind of inverts the argument and make tail
print anything but the first x-1
lines. tail -n +1
would print the whole file, tail -n +2
everything but the first line, etc.
GNU tail
is much faster than sed
. tail
is also available on BSD and the -n +2
flag is consistent across both tools. Check the FreeBSD or OS X man pages for more.
The BSD version can be much slower than sed
, though. I wonder how they managed that; tail
should just read a file line by line while sed
does pretty complex operations involving interpreting a script, applying regular expressions and the like.
Note: You may be tempted to use
# THIS WILL GIVE YOU AN EMPTY FILE!
tail -n +2 "$FILE" > "$FILE"
but this will give you an empty file. The reason is that the redirection (>
) happens before tail
is invoked by the shell:
$FILE
tail
tail
process to $FILE
tail
reads from the now empty $FILE
If you want to remove the first line inside the file, you should use:
tail -n +2 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"
The &&
will make sure that the file doesn't get overwritten when there is a problem.