Unicode Versions Supported in Java-History

This section provides a quick summary of different Unicode versions supported in Java history. A major change was introduced in J2SE 5 to support Unicode 4.0 which contains supplementary characters in the range of U+10000 to U+10FFFF.

Unicode has been supported in the Java language since its first release in 1996. Both Unicode and Java have evolved with multiple versions and releases since then. Here is a summary of which version of Unicode is supported in which Java releases.

Java version   Release date         Unicode version

JDK 1.0        January 23, 1996     Unicode 1.1.5
JDK 1.1        February 19, 1997    Unicode 2.0
JDK 1.1.7      September 12, 1997   Unicode 2.1
J2SE 1.2       December 8, 1998     Unicode 2.1
J2SE 1.3       May 8, 2000          Unicode 2.1
J2SE 1.4       February 6, 2002     Unicode 3.0
J2SE 5.0       September 30, 2004   Unicode 4.0
Java SE 6      December 11, 2006    Unicode 4.0
Java SE 7      July 28, 2011        Unicode 6.0

From JDK 1.0 to JDK 1.4, Java can only support Unicode code points in the range of U+0000 to U+FFFF. Unicode characters in this range are called BMP (Basic Multilingual Plane) characters.

Designers of JDK 1.0 used the following primitive types and built-in class types to support Unicode BMP characters in very natural way:

char - A primitive type with 16-bit storage size, which is perfected designed to store the code point value of a single BMP character.

Character - A built-in class that wraps a value of the primitive type char in an object. The Character class is perfectly designed to represent a single BMP character as an object. The Character class also provides some useful static methods for managing individual BMP characters.

String - A built-in class that wraps an array of primitive type char in an object. The String class is perfectly designed to represent a sequence of BMP characters as an object. The String class also provides some useful static and instance methods for managing a string of BMP characters.

Starting in J2SE 5.0, support of Unicode code points in the range of U+10000 to U+10FFFF was introduced in Java. Unicode characters in this range are called supplementary characters, implemented in Unicode 3.1 in 2001.

Designers of J2SE 5.0 did not introduce any new primitive types and built-in class types to support Unicode supplementary characters. What they did was:

The surrogate char pair of a supplementary character is really the UTF-16BE encoded char sequence of the supplementary character. See other sections in this book for the UTF-16BE encoding algorithm.

Using UTF-16BE encoded char sequences to represent supplementary characters does give one main advantage to the existing class type String: A String object can still be viewed as a char sequence that represent a sequence of any Unicode characters: BMP characters and/or supplementary characters.

If you are interested in why J2SE 5.0 designers did not choose to add a new primitive type like char32 to help support supplementary characters, read the article "Supplementary Characters in the Java Platform by Norbert Lindenberg and Masayoshi Okutsu.

Last update: 2012.

Table of Contents

 About This Book

 Character Sets and Encodings

 ASCII Character Set and Encoding

 GB2312 Character Set and Encoding

 GB18030 Character Set and Encoding

 JIS X0208 Character Set and Encodings

 Unicode Character Set

 UTF-8 (Unicode Transformation Format - 8-Bit)

 UTF-16, UTF-16BE and UTF-16LE Encodings

 UTF-32, UTF-32BE and UTF-32LE Encodings

Java Language and Unicode Characters

Unicode Versions Supported in Java-History

 'int' and 'String' - Basic Data Types for Unicode

 "Character" Class with Unicode Utility Methods

 Character.toChars() - "char" Sequence of Code Point

 Character.getNumericValue() - Numeric Value of Code Point

 "String" Class with Unicode Utility Methods

 String.length() Is Not Number of Characters

 String.toCharArray() Returns the UTF-16BE Sequence

 Character Encoding in Java

 Character Set Encoding Maps

 Encoding Conversion Programs for Encoded Text Files

 Using Notepad as a Unicode Text Editor

 Using Microsoft Word as a Unicode Text Editor

 Using Microsoft Excel as a Unicode Text Editor

 Unicode Fonts

 Unicode Code Point Blocks - Code Charts

 Outdated Tutorials

 References

 PDF Printing Version