Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
Character Set Encoding Maps - Unicode UTF-16, UTF-16BE, UTF-16LE
This section provides a tutorial example of analyzing and printing character set encoding maps for 3 encoding, UTF-16, UTF-16BE, and UTF-16LE, for Unicode character set.
Here is the output of my sample program, EncodingAnalyzer2.java, for UTF-16 encoding with Java SE 7:
C:\herong>java EncodingAnalyzer2 UTF-16 UTF-16 encoding: 00000000 > FE FF 00 00 - 000000FF > FE FF 00 FF 00000100 > FE FF 01 00 - 000001FF > FE FF 01 FF 00000200 > FE FF 02 00 - 000002FF > FE FF 02 FF ...... 0000D700 > FE FF D7 00 - 0000D7FF > FE FF D7 FF 0000D800 > FE FF FF FD - 0000DFFF > FE FF FF FD (invalid range) 0000E000 > FE FF E0 00 - 0000E0FF > FE FF E0 FF 0000E100 > FE FF E1 00 - 0000E1FF > FE FF E1 FF 0000E200 > FE FF E2 00 - 0000E2FF > FE FF E2 FF ...... 0000FF00 > FE FF FF 00 - 0000FFFF > FE FF FF FF 00010000 > FE FF D8 00 DC 00 - 000100FF > FE FF D8 00 DC FF 00010100 > FE FF D8 00 DD 00 - 000101FF > FE FF D8 00 DD FF 00010200 > FE FF D8 00 DE 00 - 000102FF > FE FF D8 00 DE FF ...... 0010FF00 > FE FF DB FF DF 00 - 0010FFFF > FE FF DB FF DF FF Code Point > Byte Sequence - Code Point > Byte Sequence
The encoding map of UTF-16, which is another encoding used for the Unicode character set, is much simpler than UTF-8:
Here is the output for UTF-16BE encoding, the big-endian variation of UTF-16 encoding:
C:\herong>java EncodingAnalyzer2 UTF-16BE UTF-16BE encoding: 00000000 > 00 00 - 000000FF > 00 FF 00000100 > 01 00 - 000001FF > 01 FF 00000200 > 02 00 - 000002FF > 02 FF ...... 0000D700 > D7 00 - 0000D7FF > D7 FF 0000D800 > FF FD - 0000DFFF > FF FD (invalid range) 0000E000 > E0 00 - 0000E0FF > E0 FF 0000E100 > E1 00 - 0000E1FF > E1 FF 0000E200 > E2 00 - 0000E2FF > E2 FF ...... 0000FF00 > FF 00 - 0000FFFF > FF FF 00010000 > D8 00 DC 00 - 000100FF > D8 00 DC FF 00010100 > D8 00 DD 00 - 000101FF > D8 00 DD FF 00010200 > D8 00 DE 00 - 000102FF > D8 00 DE FF ...... 0010FF00 > DB FF DF 00 - 0010FFFF > DB FF DF FF Code Point > Byte Sequence - Code Point > Byte Sequence
The encoding map of UTF-16BE identical to UTF-16 except for the leading BOM 0xFEFF.
Here is the output for UTF-16LE encoding, the little-endian variation of UTF-16 encoding:
C:\herong>java EncodingAnalyzer2 UTF-16LE UTF-16LE encoding: 00000000 > 00 00 - 0000D7FF > FF D7 0000D800 > FD FF - 0000DFFF > FD FF (invalid range) 0000E000 > 00 E0 - 0000FFFF > FF FF 00010000 > 00 D8 00 DC - 000103FF > 00 D8 FF DF 00010400 > 01 D8 00 DC - 000107FF > 01 D8 FF DF 00010800 > 02 D8 00 DC - 00010BFF > 02 D8 FF DF ...... 0010FC00 > FF DB 00 DC - 0010FFFF > FF DB FF DF Code Point > Byte Sequence - Code Point > Byte Sequence
The encoding map of UTF-16LE is identical to UTF-16BE except that the byte sequence is reversed on each byte pair.
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
Java Language and Unicode Characters
Character Set Encoding Map Analyzer
Character Set Encoding Maps - US-ASCII and ISO-8859-1/Latin 1
Character Set Encoding Maps - CP1252/Windows-1252
Character Set Encoding Maps - Unicode UTF-8
►Character Set Encoding Maps - Unicode UTF-16, UTF-16BE, UTF-16LE
Character Set Encoding Maps - Unicode UTF-32, UTF-32BE, UTF-32LE
Character Counter Program for Any Given Encoding
Character Set Encoding Comparison
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor