Using awk to remove the Byte-order mark

Boldewyn picture Boldewyn · Jul 1, 2009 · Viewed 80.9k times · Source

How would an awk script (presumably a one-liner) for removing a BOM look like?

Specification:

  • print every line after the first (NR > 1)
  • for the first line: If it starts with #FE #FF or #FF #FE, remove those and print the rest

Answer

Denilson Sá Maia picture Denilson Sá Maia · Sep 1, 2010

Using GNU sed (on Linux or Cygwin):

# Removing BOM from all text files in current directory:
sed -i '1 s/^\xef\xbb\xbf//' *.txt

On FreeBSD:

sed -i .bak '1 s/^\xef\xbb\xbf//' *.txt

Advantage of using GNU or FreeBSD sed: the -i parameter means "in place", and will update files without the need for redirections or weird tricks.

On Mac:

This awk solution in another answer works, but the sed command above does not work. At least on Mac (Sierra) sed documentation does not mention supporting hexadecimal escaping ala \xef.

A similar trick can be achieved with any program by piping to the sponge tool from moreutils:

awk '…' INFILE | sponge INFILE