Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
Saving Files in "Unicode Big Endian" Option
This section provides a tutorial example on how to save text files with Nodepad by selecting the 'Unicode big endian' encoding option on the save file dialog box.
In the next test, I want to try the save function with the "Unicode big endian" encoding.
1. Run Notepad and open hello.utf-8 correctly with the UTF-8 encoding option selected.
2. Click the File > "Save As" menu. The "Save As" dialog box comes up.
3. Enter notepad_utf-16be as the new file name and select "Unicode big endian" option in the Encoding field.
4. Click the Save button. Notepad saves the text to a new file named as: notepad_utf-16be.txt.
5. To see how my text is saved by Notepad, I need to run my HEX dump program on notepad_utf-16be.txt:
C:\herong\unicode>java HexWriter notepad_utf-16be.txt notepad_utf-16be.hex Number of input bytes: 170 C:\herong\unicode>type notepad_utf-16be.hex FEFF00480065006C006C006F00200063 006F006D007000750074006500720021 0020002D00200045006E0067006C0069 00730068000D000A753581114F60597D FF010020002D002000530069006D0070 006C0069006600690065006400200043 00680069006E006500730065000D000A 96FB81664F60597DFE570020002D0020 0054007200610064006900740069006F 006E0061006C0020004300680069006E 006500730065000D000A
Very nice. This is a perfect UTF-16 encoding file using the Big-Endian with BOM format. Those leading 2 bytes represent the BOM flag, which is not part of the text.
Conclusion - The "Unicode big endian" encoding option of Notepad matches the "Big-Endian with BOM" format of Unicode UTF-16 encoding.
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
Java Language and Unicode Characters
Encoding Conversion Programs for Encoded Text Files
►Using Notepad as a Unicode Text Editor
Byte Order Mark (BOM) - FEFF - EFBBBF
►Saving Files in "Unicode Big Endian" Option
Saving Files in "Unicode" Option
Supported Save and Open File Formats
Using Microsoft Word as a Unicode Text Editor