Chinese Web Sites Using PHP - v2.24, by Herong Yang
MySQL Encoding Issue Example
This section provides an example of MySQL server encoding issue with Chinese characters, when incorrectly encoded string is stored in database and wrong character_set_* settings are used.
Here is a real example of MySQL encoding issue I have encountered recently. Take a look at the following query statement used in a PHP script and its output:
SELECT Code FROM Dictionary WHERE Code = '鸡' Code: (踡)
Why does MySQL server match and return a Chinese character that is different than the character given in the query statement?
To find out the root cause of the problem, I modified the query statement as shown below:
SELECT Code, HEX(Code) AS CodeHex, HEX('踡') AS OutHex, HEX('鸡') AS InHex
FROM Dictionary WHERE Code = '鸡'
Code: (踡)
CodeHex: (C3A8C2B8C2A1)
OutHex: (E8B8A1)
InHex: (E9B8A1)
Based on my knowledge to Chinese character encodings, I can see what really happened when the above statement was executed.
1. Since Hex string 0xC3A8C2B8C2A1 from the database was able to be displayed as correct Chinese character of 踡, I can assume that the following:
2. Since the condition "Code = '鸡'" results to "True", I can also assume that the following:
To validate my assumptions, I wrote the following simulation PHP script.
<?php
#- MySQL-Encoding-Issue.php
#- Copyright (c) 2005 HerongYang.com. All Rights Reserved.
mb_internal_encoding("utf8");
$con = mysqli_connect('127.0.0.1', 'test', 'test', 'test');
mysqli_query($con, "SET character_set_results=latin1");
mysqli_query($con, "SET character_set_connection=latin1");
mysqli_query($con, "SET character_set_client=latin1");
$res = mysqli_query($con, "SHOW VARIABLES LIKE 'character_set_%'");
while ($row = mysqli_fetch_array($res)) {
print($row['Variable_name'].": ".$row['Value']."\n");
}
mysqli_free_result($res);
mysqli_query($con, "DROP TABLE Dictionary");
mysqli_query($con,
"CREATE TABLE Dictionary (Code VARCHAR(64)) DEFAULT CHARSET=utf8");
mysqli_query($con,
"INSERT INTO Dictionary (Code) VALUES (X'C3A8C2B8C2A1')");
$sql = "SELECT Code, HEX(Code) AS CodeHex,"
. " HEX('踡') AS OutHex, HEX('鸡') AS InHex"
. " FROM Dictionary WHERE Code = X'C3A9C2B8C2A1'";
$res = mysqli_query($con, $sql);
$row = mysqli_fetch_array($res);
print("SQL: (".$sql.")\n");
print("Code: (".$row['Code'].")\n");
print("CodeHex: (".$row['CodeHex'].")\n");
print("OutHex: (".$row['OutHex'].")\n");
print("InHex: (".$row['InHex'].")\n");
mysqli_free_result($res);
$sql = "SELECT IF(CONVERT(X'C3A8C2B8C2A1' USING utf8)"
. " = CONVERT(X'C3A9C2B8C2A1' USING utf8), 'True', 'False')"
. " AS Comparison";
$res = mysqli_query($con, $sql);
$row = mysqli_fetch_array($res);
print("SQL: (".$sql.")\n");
print("Comparison: (".$row['Comparison'].")\n");
mysqli_free_result($res);
mysqli_close($con);
?>
You can try the above PHP script on your MySQL server. And you will get similar output shown below:
herong$ php MySQL-Encoding-Issue.php
character_set_client: latin1
character_set_connection: latin1
character_set_database: utf8mb4
character_set_filesystem: binary
character_set_results: latin1
character_set_server: utf8
character_set_system: utf8
character_sets_dir: /usr/local/mysql-8.0.17-.../share/charsets/
SQL: (SELECT Code, HEX(Code) AS CodeHex, HEX('踡') AS OutHex, HEX('鸡')
AS InHex FROM Dictionary WHERE Code = X'C3A9C2B8C2A1')
Code: (踡)
CodeHex: (C3A8C2B8C2A1)
OutHex: (E8B8A1)
InHex: (E9B8A1)
SQL: (SELECT IF(CONVERT(X'C3A8C2B8C2A1' USING utf8)
= CONVERT(X'C3A9C2B8C2A1' USING utf8), 'True', 'False') AS Comparison)
Comparison: (True)
Obviously, the issue can be resolved by:
However, one thing that I will don't understand today is why MySQL server evaluate Boolean expression of "CONVERT(X'C3A8C2B8C2A1' USING utf8) = CONVERT(X'C3A9C2B8C2A1' USING utf8)" to true. Please share it with me, if you know the answer.
Table of Contents
PHP Installation on Windows Systems
Integrating PHP with Apache Web Server
charset="*" - Encodings on Chinese Web Pages
Chinese Characters in PHP String Literals
Multibyte String Functions in UTF-8 Encoding
Input Text Data from Web Forms
Input Chinese Text Data from Web Forms
MySQL - Installation on Windows
MySQL - Connecting PHP to Database
►MySQL - Character Set and Encoding
Specifying Character Set for Text Columns
Creating a Table with Multiple Character Sets
Checking Character Set Setting
Storing ASCII Characters in Non-ASCII Columns
Storing Non-ASCII Characters with Encoded Bytes
Transmitting Non-ASCII Characters between PHP and MySQL
Viewing Character Set Variables
Non-ASCII Test Result Analysis
Fetching Non-ASCII Text from MySQL
MySQL - Sending Non-ASCII Text to MySQL
Retrieving Chinese Text from Database to Web Pages
Input Chinese Text Data to MySQL Database