Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
Character Set Encoding Maps - Unicode UTF-32, UTF-32BE, UTF-32LE
This section provides a tutorial example of analyzing and printing character set encoding maps for 3 encoding, UTF-32, UTF-32BE, and UTF-32LE, for Unicode character set.
Here is the output of my sample program, EncodingAnalyzer2.java, for UTF-32 encoding with Java SE 7:
C:\herong>java EncodingAnalyzer2 UTF-32 UTF-32 encoding: 00000000 > 00 00 00 00 - 000000FF > 00 00 00 FF 00000100 > 00 00 01 00 - 000001FF > 00 00 01 FF 00000200 > 00 00 02 00 - 000002FF > 00 00 02 FF ...... 0000D700 > 00 00 D7 00 - 0000D7FF > 00 00 D7 FF 0000D800 > 00 00 FF FD - 0000DFFF > 00 00 FF FD (invalid range) 0000E000 > 00 00 E0 00 - 0000E0FF > 00 00 E0 FF 0000E100 > 00 00 E1 00 - 0000E1FF > 00 00 E1 FF 0000E200 > 00 00 E2 00 - 0000E2FF > 00 00 E2 FF .... 0010FF00 > 00 10 FF 00 - 0010FFFF > 00 10 FF FF
The encoding map of UTF-32, which is another encoding used for the Unicode character set, is the simplest encoding:
Here is the output of my sample program, EncodingAnalyzer2.java, for UTF-32BE encoding with Java SE 7:
C:\herong>java EncodingAnalyzer2 UTF-32BE UTF-32BE encoding: 00000000 > 00 00 00 00 - 000000FF > 00 00 00 FF 00000100 > 00 00 01 00 - 000001FF > 00 00 01 FF 00000200 > 00 00 02 00 - 000002FF > 00 00 02 FF ...... 0000D700 > 00 00 D7 00 - 0000D7FF > 00 00 D7 FF 0000D800 > 00 00 FF FD - 0000DFFF > 00 00 FF FD (invalid range) 0000E000 > 00 00 E0 00 - 0000E0FF > 00 00 E0 FF 0000E100 > 00 00 E1 00 - 0000E1FF > 00 00 E1 FF 0000E200 > 00 00 E2 00 - 0000E2FF > 00 00 E2 FF .... 0010FF00 > 00 10 FF 00 - 0010FFFF > 00 10 FF FF
The output of UTF-32BE is identical to UTF-32.
Here is the output of my sample program, EncodingAnalyzer2.java, for UTF-32LE encoding with Java SE 7:
C:\herong>java EncodingAnalyzer2 UTF-32LE UTF-32LE encoding: 00000000 > 00 00 00 00 - 0010FFFF > FF FF 10 00
Obviously, my sample program is not doing a good job on UTF-32LE. The last byte of the encode sequence never changes with UTF-32LE and my sample program uses the last byte to detect encoding pattern changes.
Exercise: Find a better way to print out encoding mapping tables.
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
Java Language and Unicode Characters
Character Set Encoding Map Analyzer
Character Set Encoding Maps - US-ASCII and ISO-8859-1/Latin 1
Character Set Encoding Maps - CP1252/Windows-1252
Character Set Encoding Maps - Unicode UTF-8
Character Set Encoding Maps - Unicode UTF-16, UTF-16BE, UTF-16LE
►Character Set Encoding Maps - Unicode UTF-32, UTF-32BE, UTF-32LE
Character Counter Program for Any Given Encoding
Character Set Encoding Comparison
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor