GB2312 Location Codes and Native Codes

GB2312 Location Codes represent locations of characters in the GB2312 table. GB2312 Native Codes are 2 7-bit bytes derived from Location Codes.

As part of the GB2312 standard, each character has been assigned with 2 codes:

• Location Code (区位码) - A 2-number code represents the location of the character in the GB2312 character table.
• Native Code (国标码) - A sequence of 2 7-bit bytes derived from the Location Code to represent the character in computer systems.

Here are more detailed descriptions of Location Code and Native Code:

1. What IS Location Code? - The Location Code of a GB2312 character is the combination of the row number () and the column number () of the location of the character in the GB2312 table.

For example, the Chinese character is located at row 16 and column 1 in the GB2312 table. So the Location Code of is (16,1).

Since there are 94 rows and 94 columns in the GB2312 table, Location Codes will be in the range of (1,1) and (94,94).

2. What Is Native Code? - The Native Code of a GB2312 character is a sequence of 2 bytes represents the character in computer systems. The first byte of the code is called the high byte, and the second byte of the code is called the low byte.

The high byte is derived from the row number of the character by adding 32 to the row number value.

The low byte is derived from the column number of the character by adding 32 to the column number value.

For example, the Chinese character has a Location Code of (16,01). So its high byte is 0x10, because 16 + 32 = 48, or 0x30. Its low byte is 0x21, because 1 + 32 = 33, or 0x21. Putting them together, the Native Code of is 0x3021.

I guess the reason to add 32 on both the row number and the column number is for resulting byte values to not fall into the low byte value range. In computer systems, low value bytes are usually reserved to represent controlling commands.

Native Codes will be in the range of (0x21,0x21) and (0x7E,0x7E), Since there are only 94, or 0x5E rows and 94, or 0x5E columns in the GB2312 table.

GB2312 Native Codes are perfectly good for storing Chinese documents as computer files and transmitting them over computer networks without any problem, because:

• The highest bit of each is not used. Only 7 low bits are used in each byte.
• Lower value bytes (< 0x21) are not used.

However, GB2312 Native Codes are not compatible with ASCII Codes. In other words, GB2312 Native Codes and ASCII Codes can not be mixed together in a single file. This is because there is no way to differentiate if a byte is an ASCII Code or a high/low byte of a GB2312 Native Code.

For example, the byte 0x30 in a GB2312 Native Code and ASCII Code mixed file could be the ASCII '0' character, or the high byte of GB2312 character.

The next section describes some solutions to this problem.

Last update: 2015.