Byte Order Mark (BOM) - FEFF - EFBBBF

This section provides a brief introduction on the Byte Order Mark (BOM) character, U+FEFF, used as the Unicode character stream signature when prepended to a character stream. The U+FEFF character becomes a 3-byte sequence of EFBBBF when encoded in UTF-8.

What Is BOM (Byte Order Mark)? BOM is the informal name of the special Unicode character U+FEFF "ZERO WIDTH NO-BREAK SPACE", when it is used to prepend to a stream of Unicode characters as a "signature". This signature tells the receiver of this stream to be ready to process Unicode characters and pay attention to the serialization order of the encoding octets.

When this BOM character, U+FEFF, is serialized in UTF-8 encoding, it becomes an octet sequence of EF BB BF (\xEFBBBF).

As you can see from the previous tutorial, Notepad prepends U+FEFF to the text and converted it to EFBBBF when saving the text in UTF-8 encoding. This is why I was getting these 3 extra bytes, EFBBBF, at the beginning of the saved UTF-8 text file.

With the introduction of the BOM character, now we need to ready to support two variations of UTF-8 text file formats:

Read RFC 3629, "UTF-8, a transformation format of ISO 10646", November 2003 at http://tools.ietf.org/html/rfc3629 for more information.

Prepending the BOM character to Unicode text files is recommended by RFC 3629.

Last update: 2009.

Table of Contents

 About This Book

 Character Sets and Encodings

 ASCII Character Set and Encoding

 GB2312 Character Set and Encoding

 GB18030 Character Set and Encoding

 JIS X0208 Character Set and Encodings

 Unicode Character Set

 UTF-8 (Unicode Transformation Format - 8-Bit)

 UTF-16, UTF-16BE and UTF-16LE Encodings

 UTF-32, UTF-32BE and UTF-32LE Encodings

 Java Language and Unicode Characters

 Character Encoding in Java

 Character Set Encoding Maps

 Encoding Conversion Programs for Encoded Text Files

Using Notepad as a Unicode Text Editor

 What Is Notepad?

 Opening UTF-8 Text Files

 Opening UTF-16BE Text Files

 Opening UTF-16LE Text Files

 Saving Files in UTF-8 Option

Byte Order Mark (BOM) - FEFF - EFBBBF

 Saving Files in "Unicode Big Endian" Option

 Saving Files in "Unicode" Option

 Supported Save and Open File Formats

 Using Microsoft Word as a Unicode Text Editor

 Using Microsoft Excel as a Unicode Text Editor

 Unicode Fonts

 Unicode Code Point Blocks - Code Charts

 Outdated Tutorials

 References

 PDF Printing Version