Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
String.toCharArray() Returns the UTF-16BE Sequence
This section provides tutorial example on showing that the output of toCharArray() is the same as getBytes('UTF-16BE') at the bit level.
Another way to look at a "String" object is to dump it into a "char" sequence or a "byte" sequence with different encoding algorithms:
/* UnicodeStringEncoding.java
* Copyright (c) 2019 HerongYang.com. All Rights Reserved.
*/
import java.io.*;
class UnicodeStringEncoding {
static int[] unicodeList = {0x43, 0x2103, 0x1F132, 0x1F1A0};
static char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
public static void main(String[] arg) {
try {
// Constructing a String from a list of code points
int num = unicodeList.length;
String str = new String(unicodeList, 0, num);
// String length and code point count
System.out.print("\n # of Unicode characters: "+num);
System.out.print("\n codePointCount(): "
+str.codePointCount(0,str.length()));
System.out.print("\n length(): "
+str.length());
// Getting the char sequence
char[] charSeq = str.toCharArray();
System.out.print("\n toCharArray():");
printChars(charSeq);
// Getting Unicode encoding sequences
byte[] byteSeq8 = str.getBytes("UTF-8");
System.out.print("\n getBytes(UTF-8):");
printBytes(byteSeq8);
byte[] byteSeq16 = str.getBytes("UTF-16BE");
System.out.print("\n getBytes(UTF-16BE):");
printBytes(byteSeq16);
byte[] byteSeq32 = str.getBytes("UTF-32BE");
System.out.print("\n getBytes(UTF-32BE):");
printBytes(byteSeq32);
} catch (Exception e) {
System.out.print("\n"+e.toString());
}
}
public static void printBytes(byte[] b) {
for (int j=0; j<b.length; j++)
System.out.print(" "+byteToHex(b[j]));
}
public static String byteToHex(byte b) {
char[] a = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
return new String(a);
}
public static void printChars(char[] c) {
for (int j=0; j<c.length; j++)
System.out.print(" "+charToHex(c[j]));
}
public static String charToHex(char c) {
byte hi = (byte) (c >>> 8);
byte lo = (byte) (c & 0xff);
return byteToHex(hi) + byteToHex(lo);
}
}
Compile and run it with Java 11:
C:\herong>javac UnicodeStringEncoding.java
C:\herong>java UnicodeStringEncoding
# of Unicode characters: 4
codePointCount(): 4
length(): 6
toCharArray(): 0043 2103 D83C DD32 D83C DDA0
getBytes(UTF-8): 43 E2 84 83 F0 9F 84 B2 F0 9F 86 A0
getBytes(UTF-16BE): 00 43 21 03 D8 3C DD 32 D8 3C DD A0
getBytes(UTF-32BE): 00 00 00 43 00 00 21 03 00 01 F1 32 00 01...
The output confirms that:
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
Python Language and Unicode Characters
►Java Language and Unicode Characters
Unicode Versions Supported in Java History
'int' and 'String' - Basic Data Types for Unicode
"Character" Class with Unicode Utility Methods
Character.toChars() - "char" Sequence of Code Point
Character.getNumericValue() - Numeric Value of Code Point
"String" Class with Unicode Utility Methods
String.length() Is Not Number of Characters
►String.toCharArray() Returns the UTF-16BE Sequence
String Literals and Source Code Encoding
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor