GB2312Unicode.java - GB2312 to Unicode Mapping

GB2312Unicode.java is a Java program that generates a table to map all GB2312 characters from GB2312 Codes to Unicode Codes.

If we compare GB2312 codes with Unicode codes of same Chinese characters, we will not find any mathematical relations. So if someone wants to convert a Chinese text file from the GB2312 encoding to a Unicode encoding, he/she needs to use a big mapping table that covers all 7445 GB2312 characters.

If we search the Internet, we probably can copies of such mapping table in different formats.

But if you have JDK (Java Development Kit) installed on your computer, you build a GB2312 to Unicode mapping table yourself with a simple program.

Here is a Java program I wrote to build a GB2312 to Unicode mapping table, GB2312Unicode.java. The output of the program includes 5 columns per character:

/* GB2312Unicode.java
 - Copyright (c) 2015, HerongYang.com, All Rights Reserved.
 */
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
class GB2312Unicode {
   static OutputStream out = null;
   static char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7',
                             '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
   static int b_out[] = {201,267,279,293,484,587,625,657,734,782,827,
      874,901,980,5590};
   static int e_out[] = {216,268,280,294,494,594,632,694,748,794,836,
      894,903,994,5594};
   public static void main(String[] args) {
      try {
         out = new FileOutputStream("gb2312_unicode.gb");
         writeCode();
         out.close();
      } catch (IOException e) {
         System.out.println(e.toString());
      }
   }
   public static void writeCode() throws IOException {
      boolean reserved = false;
      String name = null;
      // GB2312 is not supported by JDK. So I am using GBK.
      CharsetDecoder gbdc = Charset.forName("GBK").newDecoder();
      CharsetEncoder uxec = Charset.forName("UTF-16BE").newEncoder();
      CharsetEncoder u8ec = Charset.forName("UTF-8").newEncoder();
      ByteBuffer gbbb = null;
      ByteBuffer uxbb = null;
      ByteBuffer u8bb = null;
      CharBuffer cb = null;
      int count = 0;
      for (int i=1; i<=94; i++) {
         // Defining row settings
         if (i>=1 && i<=9) {
            reserved = false;
            name = "Graphic symbols";
         } else if (i>=10 && i<=15) {
            reserved = true;
            name = "Reserved";
         } else if (i>=16 && i<=55) {
            reserved = false;
            name = "Level 1 characters";
         } else if (i>=56 && i<=87) {
            reserved = false;
            name = "Level 2 characters";
         } else if (i>=88 && i<=94) {
            reserved = true;
            name = "Reserved";
         }
         // writing row title
         writeln();
         writeString("<p>");
         writeNumber(i);
         writeString(" Row: "+name);
         writeln();
         writeString("</p>");
         writeln();
         if (!reserved) {
            writeln();
            writeHeader();
           // looping through all characters in one row
            for (int j=1; j<=94; j++) {
               byte hi = (byte)(0xA0 + i);
               byte lo = (byte)(0xA0 + j);
               if (validGB(i,j)) {
                  // getting GB, UTF-16BE, UTF-8 codes
                  gbbb = ByteBuffer.wrap(new byte[]{hi,lo});
                  try {
                     cb = gbdc.decode(gbbb);
                     uxbb = uxec.encode(cb);
                     cb.rewind();
                     u8bb = u8ec.encode(cb);
                  } catch (CharacterCodingException e) {
                     cb = null;
                     uxbb = null;
                     u8bb = null;
                  }
               } else {
                  cb = null;
                  uxbb = null;
                  u8bb = null;
               }
               writeNumber(i);
               writeNumber(j);
               writeString(" ");
               if (cb!=null) {
                  writeByte(hi);
                  writeByte(lo);
                  writeString(" ");
                  writeHex(hi);
                  writeHex(lo);
                  count++;
               } else {
                  writeGBSpace();
                  writeString(" null");
               }
               writeString(" ");
               writeByteBuffer(uxbb,2);
               writeString(" ");
               writeByteBuffer(u8bb,3);
               if (j%2 == 0) {
                  writeln();
               } else {
                  writeString("   ");
               }
            }
            writeFooter();
         }
      }
      System.out.println("Number of GB characters wrote: "+count);
   }
   public static void writeln() throws IOException {
      out.write(0x0D);
      out.write(0x0A);
   }
   public static void writeByte(byte b) throws IOException {
      out.write(b & 0xFF);
   }
   public static void writeByteBuffer(ByteBuffer b, int l)
      throws IOException {
      int i = 0;
      if (b==null) {
         writeString("null");
         i = 2;
      } else {
  for (i=0; i<b.limit(); i++) writeHex(b.get(i));
      }
      for (int j=i; j<l; j++) writeString("  ");
   }
   public static void writeGBSpace() throws IOException {
      out.write(0xA1);
      out.write(0xA1);
   }
   public static void writeString(String s) throws IOException {
      if (s!=null) {
         for (int i=0; i<s.length(); i++) {
            out.write((int) (s.charAt(i) & 0xFF));
         }
      }
   }
   public static void writeNumber(int i) throws IOException {
      String s = "00" + String.valueOf(i);
      writeString(s.substring(s.length()-2,s.length()));
   }
   public static void writeHex(byte b) throws IOException {
      out.write((int) hexDigit[(b >> 4) & 0x0F]);
      out.write((int) hexDigit[b & 0x0F]);
   }
   public static void writeHeader() throws IOException {
      writeString("<pre>");
      writeln();
      writeString("Q.W. ");
      writeGBSpace();
      writeString(" GB   Uni. UTF-8 ");
      writeString("   ");
      writeString("Q.W. ");
      writeGBSpace();
      writeString(" GB   Uni. UTF-8 ");
      writeln();
      writeln();
   }
   public static void writeFooter() throws IOException {
      writeString("</pre>");
      writeln();
   }
   public static boolean validGB(int i,int j) {
      for (int l=0; l<b_out.length; l++) {
         if (i*100+j>=b_out[l] && i*100+j<=e_out[l]) return false;
      }
      return true;
   }
}

The entire output of this program is included later in the book.

Table of Contents

 About This Book

 Introduction to GB2312

GB2312Unicode.java - GB2312 to Unicode Mapping

 GB2312 to Unicode Mapping - Non-Chinese Characters

 GB2312 to Unicode Mapping - Level 1 Characters

 GB2312 to Unicode Mapping - Level 2 Characters

 UnicodeGB2312.java - Unicode to GB2312 Mapping

 Unicode to GB2312 Mapping - All 7,445 Characters

 References of This Book - GB2312 Tutorials

 Full Version in PDF/ePUB