UTF-8 Encoding Pages with Big5 Characters

This section describes an error case where a UTF-8 encoding page contains Big5 character strings.

The most common errors occur on Chinese Web pages generated from PHP scripts are some character strings using encodings different than the page encoding setting. For example, a PHP sets the output Web page with charset=utf-8. But some character strings are entered in Big5 encoding. In this case, those Big5 characters will not be displayed correctly.

To show you this problem, I created this test PHP script. The output Web page is set with charset=utf-8 and most Chinese characters are entered in UTF-8 encoding. But some Chinese characters are entered in Big5 encoding.

<?php 
#- String-UTF-8-Error.php
#- Copyright (c) 2005 HerongYang.com. All Rights Reserved.
#
  $help_simplified = '这是一份非常简单的说明书…';
  $help_tradition = '這是一份非常簡單的說明書…';
  $help_big5 = '?????????????';
  print('<html>');
  print('<meta http-equiv="Content-Type"'.
    ' content="text/html; charset=utf-8"/>');
  print('<body>');
  print('<b>Chinese string in UTF-8 in PHP</b><br/>');
  print($help_simplified.'<br/>');
  print($help_tradition.'<br/>');
  print('<b>Big5 string included in a UTF-8 page</b><br/>');
  print($help_big5.'<br/>');
  print('</body>');
  print('</html>');
?>

As expected, this Web page, http://localhost/String-UTF-8-Error.html, does not display those Big5 characters correctly:

Chinese Web Page Generated by PHP using UTF-8 with Big5 Characters
Chinese Web Page Generated by PHP using UTF-8 with Big5 Characters

Table of Contents

 About This Book

 PHP Installation on Windows Systems

 Integrating PHP with Apache Web Server

 charset="*" - Encodings on Chinese Web Pages

Chinese Characters in PHP String Literals

 String Data Type, Literals and Functions

 String Literal Travel Path

 Chinese Character String with UTF-8 Encoding

 Chinese Character String with GB18030 Encoding Error

 Chinese Character String with GB18030 Encoding

 Chinese Character String with Big5 Encoding

UTF-8 Encoding Pages with Big5 Characters

 Multibyte String Functions in UTF-8 Encoding

 Input Text Data from Web Forms

 Input Chinese Text Data from Web Forms

 MySQL - Installation on Windows

 MySQL - Connecting PHP to Database

 MySQL - Character Set and Encoding

 MySQL - Sending Non-ASCII Text to MySQL

 Retrieving Chinese Text from Database to Web Pages

 Input Chinese Text Data to MySQL Database

 Chinese Text Encoding Conversion and Corruptions

 Archived Tutorials

 References

 Full Version in PDF/EPUB