Unicode Characters Supported in XML 1.1 Names

This section provides a tutorial example showing that Unicode characters from higher versions are allowed in XML 1.1 names.

XML 1.1 also allows more Unicode characters to be used in XML element names or attribute names. Since end-of-line characters are not easy to be presented as text files, I created this test program, EndOfLineXml.java:

/* UnicodeNameXml.java
 -  Copyright (c) 2014, HerongYang.com, All Rights Reserved.
 */
import java.io.*;
import java.math.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
class UnicodeNameXml {
   public static void main(String[] args) {
      try {
         String ver = args[0];
         String xmlString =
            "<?xml version=\""+ver+"\" encoding=\"UTF-16BE\"?><code>"
            +"<u0180\u0180>\u0180 - v1.1.0</u0180\u0180>"
            +"<u20AB\u20AB>\u20AB - v2.0.0</u20AB\u20AB>"
            +"<u0233\u0233>\u0233 - v3.0.0</u0233\u0233>"
            +"<u0236\u0236>\u0236 - v4.0.0</u0236\u0236>"
            +"<u0237\u0237>\u0237 - v4.1.0</u0237\u0237>"
            +"</code>";
         File xmlFile = new File(args[1]);
      	 FileOutputStream fos = new FileOutputStream(xmlFile);
      	 OutputStreamWriter osw = 
      	    new OutputStreamWriter(fos,"UTF-16BE");
         osw.write(xmlString);
         osw.close();

         DocumentBuilderFactory fct
            = DocumentBuilderFactory.newInstance();
         DocumentBuilder bld = fct.newDocumentBuilder();
         Document doc = bld.parse(xmlFile);
         dumpNode(doc, "");

      } catch (Exception e) {
         System.out.println(e.toString());
      }
   }
   static void dumpNode(Node n, String p) throws Exception {
      NodeList l = n.getChildNodes();
      NamedNodeMap m = n.getAttributes();
      int ml = -1;
      if (m!=null) ml = m.getLength(); 
      System.out.println(p+n.getNodeName()+": "
         +n.getNodeType()+", "+l.getLength()+", "
         +ml+", "+n.getNodeValue());
      for (int i=0; i<ml; i++) {
         Node c = m.item(i);
         dumpNode(c,p+" |-");
      }
      for (int i=0; i<l.getLength(); i++) {
         Node c = l.item(i);
         dumpNode(c,p+" ");
      }
   }
}

Some notes on UnicodeNameXml.java:

Let's try XML 1.0 first with JDK 1.8:

C:\herong\xml>java UnicodeNameXml 1.0 unicode-name-1-0.xml
[Fatal Error] unicode-name-1-0.xml:1:81: Element type "u20AB" must be
followed by either attribute specifications, ">" or "/>".
org.xml.sax.SAXParseException; systemId: 
file:/C:/herong/xml/unicode-name-1-0.xml; 
lineNumber: 1; columnNumber: 81; Element type "u20AB" must be followed 
by either attribute specifications, ">" or "/>".

The output proves that Unicode 1.1.0 character #x0180 is allowed in XML 1.0 names. But Unicode 2.0.0 character #x20AB is not allowed. Other Unicode characters included in the test program are not allowed. You remove the #x20AB line to test them.

Here is the output of XML 1.1:

C:\herong\xml>java UnicodeNameXml 1.1 unicode-name-1-1.xml
#document: 9, 1, -1, null
 code: 1, 5, 0, null
  u0180?: 1, 1, 0, null
   #text: 3, 0, -1, ? - v1.1.0
  u20AB?: 1, 1, 0, null
   #text: 3, 0, -1, ? - v2.0.0
  u0233?: 1, 1, 0, null
   #text: 3, 0, -1, ? - v3.0.0
  u0236?: 1, 1, 0, null
   #text: 3, 0, -1, ? - v4.0.0
  u0237?: 1, 1, 0, null
   #text: 3, 0, -1, ? - v4.1.0

Cool. All Unicode characters included in the program are allowed in XML 1.1 names.

Last update: 2014.

Table of Contents

 About This Book

 Introduction of XML (eXtensible Markup Language)

 XML File Syntax

 XML File Browsers

 DOM (Document Object Model) Programming Interface

 SAX (Simple API for XML) Programming Interface

 DTD (Document Type Definition) Introduction

 Syntaxes of DTD Statements

 Validating an XML Document against the Specified DTD Document Type

 XSD (XML Schema Definition) Introduction

 Syntaxes of XSD Statements

 Validating XML Documents Against Specified XML Schemas

 XSL (Extensible Stylesheet Language) Introduction

 XSLT (XSL Transformations) Introduction

 Java Implementation of XSLT

 XPath (XML Path) Language

 XSLT Elements as Programming Statements

 Control and Generate XML Element in the Result

 XML Notepad - XML Editor

 XML Tools Plugin for Notepad++

XML 1.1 Changes and Parsing Examples

 Major Changes in XML 1.1

 Supporting XML 1.1 in Java 6 and Higher

 Control Codes Supported in XML 1.1

Unicode Characters Supported in XML 1.1 Names

 End-of-Line Characters Supported in XML 1.1

 Web Browsers Not Supporting XML 1.1

 Outdated Tutorials

 References

 PDF Printing Version