XML Tutorials - Herong's Tutorial Examples - v5.25, by Herong Yang
Unicode Characters Supported in XML 1.1 Names
This section provides a tutorial example showing that Unicode characters from higher versions are allowed in XML 1.1 names.
XML 1.1 also allows more Unicode characters to be used in XML element names or attribute names. Since end-of-line characters are not easy to be presented as text files, I created this test program, EndOfLineXml.java:
/* UnicodeNameXml.java * Copyright (c) 2002-2018 HerongYang.com. All Rights Reserved. */ import java.io.*; import java.math.*; import javax.xml.parsers.*; import org.w3c.dom.*; class UnicodeNameXml { public static void main(String[] args) { try { String ver = args[0]; String xmlString = "<?xml version=\""+ver+"\" encoding=\"UTF-16BE\"?><code>" +"<u0180\u0180>\u0180 - v1.1.0</u0180\u0180>" +"<u20AB\u20AB>\u20AB - v2.0.0</u20AB\u20AB>" +"<u0233\u0233>\u0233 - v3.0.0</u0233\u0233>" +"<u0236\u0236>\u0236 - v4.0.0</u0236\u0236>" +"<u0237\u0237>\u0237 - v4.1.0</u0237\u0237>" +"</code>"; File xmlFile = new File(args[1]); FileOutputStream fos = new FileOutputStream(xmlFile); OutputStreamWriter osw = new OutputStreamWriter(fos,"UTF-16BE"); osw.write(xmlString); osw.close(); DocumentBuilderFactory fct = DocumentBuilderFactory.newInstance(); DocumentBuilder bld = fct.newDocumentBuilder(); Document doc = bld.parse(xmlFile); dumpNode(doc, ""); } catch (Exception e) { System.out.println(e.toString()); } } static void dumpNode(Node n, String p) throws Exception { NodeList l = n.getChildNodes(); NamedNodeMap m = n.getAttributes(); int ml = -1; if (m!=null) ml = m.getLength(); System.out.println(p+n.getNodeName()+": " +n.getNodeType()+", "+l.getLength()+", " +ml+", "+n.getNodeValue()); for (int i=0; i<ml; i++) { Node c = m.item(i); dumpNode(c,p+" |-"); } for (int i=0; i<l.getLength(); i++) { Node c = l.item(i); dumpNode(c,p+" "); } } }
Some notes on UnicodeNameXml.java:
Let's try XML 1.0 first with JDK:
herong> java UnicodeNameXml 1.0 unicode-name-1-0.xml [Fatal Error] unicode-name-1-0.xml:1:81: Element type "u20AB" must be followed by either attribute specifications, ">" or "/>". org.xml.sax.SAXParseException; systemId: file:/C:/herong/unicode-name-1-0.xml; lineNumber: 1; columnNumber: 81; Element type "u20AB" must be followed by either attribute specifications, ">" or "/>".
The output proves that Unicode 1.1.0 character #x0180 is allowed in XML 1.0 names. But Unicode 2.0.0 character #x20AB is not allowed. Other Unicode characters included in the test program are not allowed. You remove the #x20AB line to test them.
Here is the output of XML 1.1:
herong> java UnicodeNameXml 1.1 unicode-name-1-1.xml #document: 9, 1, -1, null code: 1, 5, 0, null u0180?: 1, 1, 0, null #text: 3, 0, -1, ? - v1.1.0 u20AB?: 1, 1, 0, null #text: 3, 0, -1, ? - v2.0.0 u0233?: 1, 1, 0, null #text: 3, 0, -1, ? - v3.0.0 u0236?: 1, 1, 0, null #text: 3, 0, -1, ? - v4.0.0 u0237?: 1, 1, 0, null #text: 3, 0, -1, ? - v4.1.0
Cool. All Unicode characters included in the program are allowed in XML 1.1 names.
Table of Contents
Introduction of XML (eXtensible Markup Language)
DOM (Document Object Model) Programming Interface
SAX (Simple API for XML) Programming Interface
DTD (Document Type Definition) Introduction
Validating an XML Document against the Specified DTD Document Type
XSD (XML Schema Definition) Introduction
Validating XML Documents Against Specified XML Schemas
XSL (Extensible Stylesheet Language) Introduction
XSLT (XSL Transformations) Introduction
XSLT Elements as Programming Statements
Control and Generate XML Element in the Result
PHP Extensions for XML Manipulation
Processing XML with Python Scripts
XML Tools Plugin for Notepad++
XML Plugin Packages for Atom Editor
►XML 1.1 Changes and Parsing Examples
Supporting XML 1.1 in Java and Higher
Control Codes Supported in XML 1.1
►Unicode Characters Supported in XML 1.1 Names
End-of-Line Characters Supported in XML 1.1