Unicode Versions Supported in Java-History

This section provides a quick summary of different Unicode versions supported in Java history. A major change was introduced in J2SE 5 to support Unicode 4.0 which contains supplementary characters in the range of U+10000 to U+10FFFF.



Unicode has been supported in the Java language since its first release in 1996. Both Unicode and Java have evolved with multiple versions and releases since then. Here is a summary of which version of Unicode is supported in which Java releases.

Java version   Release date         Unicode version
------------   ------------         ---------------
Java 11        March 2018           Unicode 10.0
Java 10        September 2018	    Unicode 8.0
Java 9         September 2017       Unicode 8.0
Java 8         March 2014	        Unicode 6.2
Java SE 7      July 28, 2011        Unicode 6.0
Java SE 6      December 11, 2006    Unicode 4.0
J2SE 5.0       September 30, 2004   Unicode 4.0
J2SE 1.4       February 6, 2002     Unicode 3.0
J2SE 1.3       May 8, 2000          Unicode 2.1
J2SE 1.2       December 8, 1998     Unicode 2.1
JDK 1.1        February 19, 1997    Unicode 2.0
JDK 1.1.7      September 12, 1997   Unicode 2.1
JDK 1.1        February 19, 1997    Unicode 2.0
JDK 1.0        January 23, 1996     Unicode 1.1.5

From JDK 1.0 to JDK 1.4, Java can only support Unicode code points in the range of U+0000 to U+FFFF. Unicode characters in this range are called BMP (Basic Multilingual Plane) characters.

Designers of JDK 1.0 used the following primitive types and built-in class types to support Unicode BMP characters in very natural way:

char - A primitive type with 16-bit storage size, which is perfected designed to store the code point value of a single BMP character.

Character - A built-in class that wraps a value of the primitive type char in an object. The Character class is perfectly designed to represent a single BMP character as an object. The Character class also provides some useful static methods for managing individual BMP characters.

String - A built-in class that wraps an array of primitive type char in an object. The String class is perfectly designed to represent a sequence of BMP characters as an object. The String class also provides some useful static and instance methods for managing a string of BMP characters.

Starting in J2SE 5.0, support of Unicode code points in the range of U+10000 to U+10FFFF was introduced in Java. Unicode characters in this range are called supplementary characters, implemented in Unicode 3.1 in 2001.

Designers of J2SE 5.0 did not introduce any new primitive types and built-in class types to support Unicode supplementary characters. What they did was:

The surrogate char pair of a supplementary character is really the UTF-16BE encoded char sequence of the supplementary character. See other sections in this book for the UTF-16BE encoding algorithm.

Using UTF-16BE encoded char sequences to represent supplementary characters does give one main advantage to the existing class type String: A String object can still be viewed as a char sequence that represent a sequence of any Unicode characters: BMP characters and/or supplementary characters.

If you are interested in why J2SE 5.0 designers did not choose to add a new primitive type like char32 to help support supplementary characters, read the article "Supplementary Characters in the Java Platform" by Norbert Lindenberg and Masayoshi Okutsu at oracle.com/us/technologies/java/supplementary-142654.html.



 

Table of Contents

 About This Book

 Character Sets and Encodings

 ASCII Character Set and Encoding

 GB2312 Character Set and Encoding

 GB18030 Character Set and Encoding

 JIS X0208 Character Set and Encodings

 Unicode Character Set

 UTF-8 (Unicode Transformation Format - 8-Bit)

 UTF-16, UTF-16BE and UTF-16LE Encodings

 UTF-32, UTF-32BE and UTF-32LE Encodings

Java Language and Unicode Characters

Unicode Versions Supported in Java-History

 'int' and 'String' - Basic Data Types for Unicode

 "Character" Class with Unicode Utility Methods

 Character.toChars() - "char" Sequence of Code Point

 Character.getNumericValue() - Numeric Value of Code Point

 "String" Class with Unicode Utility Methods

 String.length() Is Not Number of Characters

 String.toCharArray() Returns the UTF-16BE Sequence

 String Literals and Source Code Encoding

 Character Encoding in Java

 Character Set Encoding Maps

 Encoding Conversion Programs for Encoded Text Files

 Using Notepad as a Unicode Text Editor

 Using Microsoft Word as a Unicode Text Editor

 Using Microsoft Excel as a Unicode Text Editor

 Unicode Fonts

 Unicode Code Point Blocks - Code Charts

 Outdated Tutorials

 References

 Full Version in PDF/EPUB