Basic Rules of Using Non-ASCII Characters in HTML Documents

This section describes basic rules on how non-ASCII character strings should be managed at different steps to ensure localized text strings can be used in HTML documents and displayed correctly on the browser window.

As you can see from the previous chapters, a Web based application always delivers information to the user interface as a HTML document. The application can either take a static HTML document from the file system, or generate a dynamic HTML document from a PHP script.

First, let's concentrate on how to handle non ASCII characters in static HTML documents. Here are the steps and technologies involved in entering a HTML document and delivering it to the user interface:

H1. Key Sequences from keyboard
      |
      |- Text editor
      v
H2. HTML Document
      |
      |- Web server
      v
H3. HTTP Response
      |
      |- Internet TCP/IP Connection
      v
H4. HTTP Response
      |
      |- Web browser
      v
H5. Visiual characters on the Screen

Based on my experience, here are some basic rules related to those steps:

1. You must decide on the character encoding schema to be used in the HTML document first. For most of human written languages, you have two options, a) use a encoding schema specific to that language; b) use a Unicode schema. For example, you can use either GB2312 (a simplified Chinese character schema) or UTF-8 (a Unicode character schema) for Chinese characters. My suggestion used to be "a". But from now on, I am suggesting "b", because Unicode schema can support all characters of all languages.

2. PHP seems to be a nice language. The data type of string is defined as a sequence of bytes, like C language. This is different than Java language, where string is defined as a sequence of Unicode characters. String literals in PHP can take any sequence of bytes. Therefore you can enter non ASCII characters as PHP string literals in any encoding schema.

3. From step "H1" to "H2", you need select good text editor that supports the encoding schema you have selected. The end goal of this step is simple - characters in the HTML documents must be stored in a file using the selected encoding schema. Don't under estimate the difficulty level of this step. It could be very frustrating, because most computer keyboards support alphabetic letters only. You may have to use some language specific input software to translate alphabetic letters into language specific characters. The editor sometimes may also store characters in memory in one encoding schema, and offer you different encoding schema when saving files to hard disk.

4. From step "H3" to "H4", it is the job for the Internet to send data from the Web server to the Web browser. The HTTP response will be transmitted as is to the browser. The characters in the HTML document attached in the HTTP response will also be maintained as is.

5. From step "H4" to "H5", the browser opens the received HTML document and displays encoded characters as written characters of the specific language. To do this, the browser needs your help. The first help is to specify the character encoding name, "charset", used in the HTML document as a <meta> tag. The second help is to make sure the browser can access the a character font file designed for the specified encoding schema.

If no character encoding name is specified in the <meta> tag, some browsers will try to detect the encoding schema based on the HTML document content. If not successful, browsers will use default encoding schemas. For example, Internet Explorer (IE) use "Western European" as the default encoding schema. "Western European" seems to be referring to "ISO-8859-1" standard.

Table of Contents

 About This Book

 Introduction and Installation of PHP

 PHP Script File Syntax

 PHP Data Types and Data Literals

 Variables, References, and Constants

 Expressions, Operations and Type Conversions

 Conditional Statements - "if" and "switch"

 Loop Statements - "while", "for", and "do ... while"

 Function Declaration, Arguments, and Return Values

 Arrays - Ordered Maps

 Interface with Operating System

 Introduction of Class and Object

 Integrating PHP with Apache Web Server

 Retrieving Information from HTTP Requests

 Creating and Managing Sessions in PHP Scripts

 Sending and Receiving Cookies in PHP Scripts

 Controlling HTTP Response Header Lines in PHP Scripts

 Managing File Upload

 MySQL Server Connection and Access Functions

 Functions to Manage Directories, Files and Images

 SOAP Extension Function and Calling Web Services

 SOAP Server Functions and Examples

 Localization Overview of Web Applications

Using Non-ASCII Characters in HTML Documents

Basic Rules of Using Non-ASCII Characters in HTML Documents

 French Characters in HTML Documents - UTF-8 Encoding

 French Characters in HTML Documents - ISO-8859-1 Encoding

 Chinese Characters in HTML Documents - UTF-8 Encoding

 Chinese Characters in HTML Documents - GB2312 Encoding

 Characters of Multiple Languages in HTML Documents

 Using Non-ASCII Characters as PHP Script String Literals

 Receiving Non-ASCII Characters from Input Forms

 "mbstring" Extension and Non-ASCII Encoding Management

 Managing Non-ASCII Character Strings with MySQL Servers

 Parsing and Managing HTML Documents

 Configuring and Sending Out Emails

 Image and Picture Processing

 Managing ZIP Archive Files

 Managing PHP Engine and Modules on macOS

 Managing PHP Engine and Modules on CentOS

 Archived Tutorials

 References

 Full Version in PDF/EPUB