Adding BOM to UTF-8 files

Stephane picture Stephane · Jun 27, 2010 · Viewed 42k times · Source

I'm searching (without success) for a script, which would work as a batch file and allow me to prepend a UTF-8 text file with a BOM if it doesn't have one.

Neither the language it is written in (perl, python, c, bash) nor the OS it works on, matters to me. I have access to a wide range of computers.

I've found a lot of scripts to do the reverse (strip the BOM), which sounds to me as kind of silly, as many Windows program will have trouble reading UTF-8 text files if they don't have a BOM.

Did I miss the obvious?

Thanks!

Answer

Steven R. Loomis picture Steven R. Loomis · Jul 20, 2010

I wrote this addbom.sh using the 'file' command and ICU's 'uconv' command.

#!/bin/sh

if [ $# -eq 0 ]
then
        echo usage $0 files ...
        exit 1
fi

for file in "$@"
do
        echo "# Processing: $file" 1>&2
        if [ ! -f "$file" ]
        then
                echo Not a file: "$file" 1>&2
                exit 1
        fi
        TYPE=`file - < "$file" | cut -d: -f2`
        if echo "$TYPE" | grep -q '(with BOM)'
        then
                echo "# $file already has BOM, skipping." 1>&2
        else
                ( mv "${file}" "${file}"~ && uconv -f utf-8 -t utf-8 --add-signature < "${file}~" > "${file}" ) || ( echo Error processing "$file" 1>&2 ; exit 1)
        fi
done

edit: Added quotes around the mv arguments. Thanks @DirkR and glad this script has been so helpful!