Skip to main content

Use awk or sed to Detect or Remove Byte Order Mark (BOM).

##
# Using awk or sed to Detect or Remove Byte Order Mark (BOM)
#
# Reasons for *not* having a BOM in utf-8 encoded files:
#
# - It breaks shellscripts
# - It breaks all kind of text processing.
# - It takes up three whole bytes!
# - It looks ugly in your editor. Unless it thinks it should be smart and
#   decides it needs to hide it from you.
# - The utf-8 BOM is illegal in ASCII-encoded files. It breaks compatibility
#   with ASCII.
#
# REFERENCES
#
# - [Using awk or sed to Detect or Remove Byte Order Mark](http://muzso.hu/2011/11/08/using-awk-sed-to-detect-remove-the-byte-order-mark-bom)
# - [Bomstrip](http://www.ueber.net/who/mjl/projects/bomstrip/)
##

# To remove BOM using AWK:
awk '{ if (NR == 1) sub(/^\xef\xbb\xbf/, ""); print }' INFILE > OUTFILE

# To remove BOM using SED (in place):
sed -i -e '1s/^\xEF\xBB\xBF//' FILE

# To recursively *DETECT* files with a BOM:
find . -type f -print0 | xargs -0r awk '/^\xEF\xBB\xBF/ {print FILENAME} {nextfile}'

# To recursively *REMOVE* files with a BOM:
find . -type f -exec sed -i -e '1s/^\xEF\xBB\xBF//' {} \;