"String" Class with Unicode Utility Methods

This section provides an introduction on 'String' class methods added and modified since J2SE 5.0 to support Unicode character processing.

Since designers of J2SE 5.0 did not change the internal storage mechanism for the "String" class, Unicode supplementary characters will be stored as surrogate "char" pairs in "String" objects. In other words, a single supplementary character will take 2 storage positions in a "String" object. If all characters in a "String" object are supplementary characters, the length of the "String" object is 2 times of the number of characters.

If a "String" object contains both BMP characters and supplementary characters, there is no 1-to-1 relation between Unicode character positions and "char" storage positions. The n-th Unicode character may not be stored at the n-th or 2*n-th "char" position in a "String" object.

To help manage this inconvenience, designers of J2SE 5.0 enhanced some existing methods and added some new methods in the "String" class. Here are some examples:

Table of Contents

 About This Book

 Character Sets and Encodings

 ASCII Character Set and Encoding

 GB2312 Character Set and Encoding

 GB18030 Character Set and Encoding

 JIS X0208 Character Set and Encodings

 Unicode Character Set

 UTF-8 (Unicode Transformation Format - 8-Bit)

 UTF-16, UTF-16BE and UTF-16LE Encodings

 UTF-32, UTF-32BE and UTF-32LE Encodings

Java Language and Unicode Characters

 Unicode Versions Supported in Java-History

 'int' and 'String' - Basic Data Types for Unicode

 "Character" Class with Unicode Utility Methods

 Character.toChars() - "char" Sequence of Code Point

 Character.getNumericValue() - Numeric Value of Code Point

"String" Class with Unicode Utility Methods

 String.length() Is Not Number of Characters

 String.toCharArray() Returns the UTF-16BE Sequence

 String Literals and Source Code Encoding

 Character Encoding in Java

 Character Set Encoding Maps

 Encoding Conversion Programs for Encoded Text Files

 Using Notepad as a Unicode Text Editor

 Using Microsoft Word as a Unicode Text Editor

 Using Microsoft Excel as a Unicode Text Editor

 Unicode Fonts

 Unicode Code Point Blocks: 0000 - 0FFF

 Unicode Code Point Blocks: 1000 - FFFF

 Unicode Code Point Blocks: 10000 - 11FFF

 Unicode Code Point Blocks: 12000 - 10FFFF

 Outdated Tutorials


 Full Version in PDF/EPUB