GB2312 Encoding for GB2312 Character Set

This section provides a quick introduction of the GB2312 encoding for the GB2312 character set. GB2312 is a 2-byte (8 bits per bytes) encoding.

GB2312 encoding is the main encoding for the GB2312 character set. GB2312 encoding is based on native code values of GB2312 characters.

The native code value of each GB2312 character contains 2 bytes. The first byte is called the high byte, containing the row number plus 32; the second byte is called the low byte, containing the column number plus 32. For example, if a character is located at row 16 and column 1, its high byte will be 16 + 32 = 48 (0x30), and log byte will be 1 + 32 = 33 (0x21). Put them together, its native code value will be 0x3021.

I guess that the reason to add 32 on both the row number and the column number is for the byte value to not fall into the low value range, which is usually reserved to represent controlling commands in many computer systems.

However, byte values of GB2312 native codes are not directly used as GB2312 encoding byte sequences, because they are still colliding with ASCII encoding types. To resolve this problem, a value of 128 is added to both bytes of native codes. For example, if a character is located at row 16 and column 1, its native code will be 0x3021, and its modified code will be 0xB0A1.

These modified codes are adopted as the GB2312 encoding, which can be safely mixed together with the ASCII encoding.

GB2312 encoding is also called EUC-CN (Extended Unix Code for China).

GB2312 character set has another encoding called HZ, which maps each GB2312 character to 2 7-bit bytes uses ~{...~} to separate GB2312 characters from ASCII characters.

Table of Contents

 About This Book

 Character Sets and Encodings

 ASCII Character Set and Encoding

GB2312 Character Set and Encoding

 GB2312 Character Set for Chinese Characters

GB2312 Encoding for GB2312 Character Set

 Relation of GB2312 and Unicode

 GB18030 Character Set and Encoding

 JIS X0208 Character Set and Encodings

 Unicode Character Set

 UTF-8 (Unicode Transformation Format - 8-Bit)

 UTF-16, UTF-16BE and UTF-16LE Encodings

 UTF-32, UTF-32BE and UTF-32LE Encodings

 Python Language and Unicode Characters

 Java Language and Unicode Characters

 Character Encoding in Java

 Character Set Encoding Maps

 Encoding Conversion Programs for Encoded Text Files

 Using Notepad as a Unicode Text Editor

 Using Microsoft Word as a Unicode Text Editor

 Using Microsoft Excel as a Unicode Text Editor

 Unicode Fonts

 Archived Tutorials

 References

 Full Version in PDF/EPUB