String.toCharArray() Returns the UTF-16BE Sequence

This section provides tutorial example on showing that the output of toCharArray() is the same as getBytes('UTF-16BE') at the bit level.

Another way to look at a "String" object is to dump it into a "char" sequence or a "byte" sequence with different encoding algorithms.

/**
 * UnicodeStringEncoding.java
 - Copyright (c) 2012, HerongYang.com, All Rights Reserved.
 */
import java.io.*;
class UnicodeStringEncoding {
   static int[] unicodeList = {0x43, 0x2103, 0x1F132, 0x1F1A0};
   static char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7',
                             '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
   public static void main(String[] arg) {
      try {   	

// Constructing a String from a list of code points
         int num = unicodeList.length;
         String str = new String(unicodeList, 0, num);

// String length and code point count
         System.out.print("\n # of Unicode characters: "+num);
         System.out.print("\n        codePointCount(): "
            +str.codePointCount(0,str.length()));
         System.out.print("\n                length(): "
            +str.length());

// Getting the char sequence
         char[] charSeq = str.toCharArray();
         System.out.print("\n           toCharArray():");
         printChars(charSeq);

// Getting Unicode encoding sequences
         byte[] byteSeq8 = str.getBytes("UTF-8");
         System.out.print("\n         getBytes(UTF-8):");
         printBytes(byteSeq8);
         byte[] byteSeq16 = str.getBytes("UTF-16BE");
         System.out.print("\n      getBytes(UTF-16BE):");
         printBytes(byteSeq16);
         byte[] byteSeq32 = str.getBytes("UTF-32BE");
         System.out.print("\n      getBytes(UTF-32BE):");
         printBytes(byteSeq32);
      } catch (Exception e) {
         System.out.print("\n"+e.toString());
      }
   }
   public static void printBytes(byte[] b) {
      for (int j=0; j<b.length; j++)
         System.out.print(" "+byteToHex(b[j]));
   }
   public static String byteToHex(byte b) {
      char[] a = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
      return new String(a);
   }
   public static void printChars(char[] c) {
      for (int j=0; j<c.length; j++)
         System.out.print(" "+charToHex(c[j]));
   }
   public static String charToHex(char c) {
      byte hi = (byte) (c >>> 8);
      byte lo = (byte) (c & 0xff);
      return byteToHex(hi) + byteToHex(lo);
   }
}

Compile and run it with Java SE 7:

C:\herong>javac UnicodeStringEncoding.java

C:\herong>java UnicodeStringEncoding
 # of Unicode characters: 4
        codePointCount(): 4
                length(): 6
           toCharArray(): 0043 2103 D83C DD32 D83C DDA0
         getBytes(UTF-8): 43 E2 84 83 F0 9F 84 B2 F0 9F 86 A0
      getBytes(UTF-16BE): 00 43 21 03 D8 3C DD 32 D8 3C DD A0
      getBytes(UTF-32BE): 00 00 00 43 00 00 21 03 00 01 F1 32 00 01...

The output confirms that:

Last update: 2012.

Table of Contents

 About This Book

 Character Sets and Encodings

 ASCII Character Set and Encoding

 GB2312 Character Set and Encoding

 GB18030 Character Set and Encoding

 JIS X0208 Character Set and Encodings

 Unicode Character Set

 UTF-8 (Unicode Transformation Format - 8-Bit)

 UTF-16, UTF-16BE and UTF-16LE Encodings

 UTF-32, UTF-32BE and UTF-32LE Encodings

Java Language and Unicode Characters

 Unicode Versions Supported in Java-History

 'int' and 'String' - Basic Data Types for Unicode

 "Character" Class with Unicode Utility Methods

 Character.toChars() - "char" Sequence of Code Point

 Character.getNumericValue() - Numeric Value of Code Point

 "String" Class with Unicode Utility Methods

 String.length() Is Not Number of Characters

String.toCharArray() Returns the UTF-16BE Sequence

 Character Encoding in Java

 Character Set Encoding Maps

 Encoding Conversion Programs for Encoded Text Files

 Using Notepad as a Unicode Text Editor

 Using Microsoft Word as a Unicode Text Editor

 Using Microsoft Excel as a Unicode Text Editor

 Unicode Fonts

 Unicode Code Point Blocks - Code Charts

 Outdated Tutorials

 References

 PDF Printing Version