I'm learning bash and I saw this construction:
cat file | while IFS= read -r line;
do
...
done
Can anyone explain what IFS=
does? I know it's input field separator, but why is it being set to nothing?
IFS
does many things but you are asking about that particular loop.
The effect in that loop is to preserve leading and trailing white space in line
. To illustrate, first observe with IFS set to nothing:
$ echo " this is a test " | while IFS= read -r line; do echo "=$line=" ; done
= this is a test =
The line
variable contains all the white space it received on its stdin. Now, consider the same statement with the default IFS:
$ echo " this is a test " | while read -r line; do echo "=$line=" ; done
=this is a test=
In this version, the white space internal to the line is still preserved. But, the leading and trailing white space have been removed.
-r
do in read -r
?The -r
option prevents read
from treating backslash as a special character.
To illustrate, we use two echo commands that supply two lines to the while
loop. Observe what happens with -r
:
$ { echo 'this \\ line is \' ; echo 'continued'; } | while IFS= read -r line; do echo "=$line=" ; done
=this \\ line is \=
=continued=
Now, observe what happens without -r
:
$ { echo 'this \\ line is \' ; echo 'continued'; } | while IFS= read line; do echo "=$line=" ; done
=this \ line is continued=
Without -r
, two changes happened. First, the double-backslash was converted to a single backslash. Second, the backslash on the end of the first line was interpreted as a line-continuation character and the two lines were merged into one.
In sum, if you want backslashes in the input to have special meaning, don't use -r
. If you want backslashes in the input to be taken as plain characters, then use -r
.
Since read
takes input one line at a time, IFS behaves affects each line of multiple line input in the same way that it affects single line input. -r
behaves similarly with the exception that, without -r
, multiple lines can be combined into one line using the trailing backslash as shown above.
The behavior with multiple line input, however, can be changed drastically using read's -d
flag. -d
changes the delimiter character that read
uses to mark the end of an input line. For example, we can terminate lines with a tab character:
$ echo $'line one \n line\t two \n line three\t ends here'
line one
line two
line three ends here
$ echo $'line one \n line\t two \n line three\t ends here' | while IFS= read -r -d$'\t' line; do echo "=$line=" ; done
=line one
line=
= two
line three=
Here, the $'...'
construct was used to enter special characters like newline, \n
and tab, \t
. Observe that with -d$'\t'
, read
divides its input into "lines" based on tab characters. Anything after the final tab is ignored.
The most important use of the features described above is to process difficult file names. Since the one character that cannot appear in path/filenames is the null character, the null character can be used to separate a list of file names. As an example:
while IFS= read -r -d $'\0' file
do
# do something to each file
done < <(find ~/music -type f -print0)