Building Chinese Web Sites using PHP
Dr. Herong Yang, Version 2.11

Fetching Non-ASCII Text from MySQL

This section describes how to set character_set_results properly to retrieve non-ASCII character strings from MySQL server.

Based on the analysis provided in the previous section, I revised my PHP script to set character_set_results to utf8. Hope this will allow me to fetch non-ASCII characters correctly from the String_UTF8 column:

<?php #MySQL-Fetch-Non-ASCII-UTF8.php
# Copyright (c) 2007 by Dr. Herong Yang, http://www.herongyang.com/
#
  $con = mysql_connect("localhost", "Herong", "TopSecret");
  $ok = mysql_select_db("HerongDB", $con);
  $test_name = "Non-ASCII Test";

# Set character_set_results
  $sql = "SET character_set_results=utf8";
  mysql_query($sql, $con);

# Show character set encoding variables
  $sql = "SHOW VARIABLES LIKE 'character_set_%'";
  $res = mysql_query($sql, $con);
  while ($row = mysql_fetch_array($res)) {
    print($row['Variable_name']." = ".$row['Value']."\n");
  }  
  mysql_free_result($res);

# Delete the record
  $sql = "DELETE FROM Comment_Mixed WHERE Test_Name ='$test_name'";
  mysql_query($sql, $con);
  print("\nNumber of rows deleted: ".mysql_affected_rows()."\n");

# Build the SQL INSERT statement
  $sql = <<<END_OF_MESSAGE
INSERT INTO Comment_Mixed (Test_name, String_ASCII, 
    String_Latin1, String_UTF8, String_GBK, String_Big5)
  VALUES ('$test_name', 'Television', 
    X'54E96CE9766973696E6F', 
    X'E794B5E8A786E69CBA2FE99BBBE8A696E6A99F', 
    X'B5E7CAD3BBFA', 
    X'B971B5F8BEF7');
END_OF_MESSAGE;

# Run the SQL statement
  if (mysql_query($sql, $con)) {
    print("\nNumber of rows inserted: ".mysql_affected_rows()."\n");
  } else {
    print("SQL statement failed.\n");
    print(mysql_errno($con).": ".mysql_error($con)."\n"); 
  }

# Get the recod back
  $sql = "SELECT * FROM Comment_Mixed"
    . " WHERE Test_Name = '$test_name'";
  $res = mysql_query($sql, $con);
  if ($row = mysql_fetch_array($res)) {
    print("\nTest Name = ".$row['Test_Name']."\n");
    print("   String_ASCII: 0x".bin2hex($row['String_ASCII'])."\n");
    print("   String_Latin1: 0x".bin2hex($row['String_Latin1'])."\n");
    print("   String_UTF8: 0x".bin2hex($row['String_UTF8'])."\n");
    print("   String_GBK: 0x".bin2hex($row['String_GBK'])."\n");
    print("   String_Big5: 0x".bin2hex($row['String_Big5'])."\n");
  }  
  mysql_free_result($res);

  mysql_close($con); 
?>

The output confirms that character_set_results=utf8 helped to stop the wrong conversion on non-ASCII characters fetched from the String_UTF8 column:

C:\>\local\php\php MySQL-Fetch-Non-ASCII-UTF8.php
character_set_client = latin1
character_set_connection = latin1
character_set_database = latin1
character_set_filesystem = binary
character_set_results = utf8
character_set_server = latin1
character_set_system = utf8
character_sets_dir = \local\mysql\share\charsets\

Number of rows deleted: 1

Number of rows inserted: 1

Test Name = Non-ASCII Test
   String_ASCII: 0x54656c65766973696f6e
   String_Latin1: 0x54c3a96cc3a9766973696e6f
   String_UTF8: 0xe794b5e8a786e69cba2fe99bbbe8a696e6a99f
   String_GBK: 0xe794b5e8a786e69cba
   String_Big5: 0xe99bbbe8a696e6a99f

However, other encoding conversions were performed on other non-ASCII columns:

  • ASCII characters fetched from String_ASCII column were converted to UTF8. No problem.
  • Latin1 characters fetched from String_Latin1 column were converted to UTF8. MySQL did a real conversion here. Notice that the French character 0xE9 were converted to 0xC3A9.
  • UTF-8 characters fetched from String_UTF8 column were converted to UTF8. Perfect match. No conversion.
  • GBK characters fetched from String_GBK column were converted to UTF8. MySQL did a real conversion and did it correctly. Notice that the GBK string 0xB5E7CAD3BBFA was converted to 0xE794B5E8A786E69CBA, which match the UTF-8 string I entered for the String_UTF8 column.
  • Big5 characters fetched from String_Big5 column were converted to UTF8. MySQL did a real conversion and did it correctly. Notice that the Big5 string 0xB971B5F8BEF7 was converted to 0xE99BBBE8A696E6A99F, which match the UTF-8 string I entered for the String_UTF8 column.

My conclusion is that:

  • If you are working with a single character set encoding, set character_set_results to that encoding.
  • If you are working with multiple character set encodings, set character_set_results to the UTF-8, which is bigger than othe encodings. MySQL will convert fetched text from the encoding of the original column to UTF-8 correctly for you.

Table of Contents

 About This Book

 PHP Installation on Windows Systems

 Integrating PHP with Apache Web Server

 charset="*" - Encodings on Chinese Web Pages

 Chinese Characters in PHP String Literals

 Multibyte String Functions in UTF-8 Encoding

 Input Text Data from Web Forms

 Input Chinese Text Data from Web Forms

 MySQL - Installation on Windows

 MySQL - Connecting PHP to Database

MySQL - Character Set and Encoding

 Specifying Character Set for Text Columns

 Creating a Table with Multiple Character Sets

 Checking Character Set Setting

 Storing ASCII Characters in Non-ASCII Columns

 Storing Non-ASCII Characters with Encoded Bytes

 Transmitting Non-ASCII Characters between PHP and MySQL

 Viewing Character Set Variables

 Non-ASCII Test Result Analysis

Fetching Non-ASCII Text from MySQL

 MySQL - Sending Non-ASCII Text to MySQL

 Retrieving Chinese Text from Database to Web Pages

 Input Chinese Text Data to MySQL Database

 References

 PDF Printing Version

Dr. Herong Yang, updated in 2007
Fetching Non-ASCII Text from MySQL