Processing Chinese Input on Web Forms in UTF-8

This section describes how to display a Web form and process form Chinese input data in UTF-8.

Since UTF-8 encoding can handle both simplified and traditional Chinese characters, I wrote a the first test PHP script for processing Chinese input text with some interesting features:

<?php 
#- Web-Form-Input-Chinese-UTF8.php
#- Copyright (c) 2005 HerongYang.com. All Rights Reserved.
#
  print('<html><head>');
  print('<meta http-equiv="Content-Type"'.
    ' content="text/html; charset=utf-8"/>');
  print('</head><body>'."\n");

# Default input text
  $input = '电视机/電視機';
  $input_hex = 'E794B5E8A786E69CBA2FE99BBBE8A696E6A99F'; 

# Form reply determination
  $reply = isset($_REQUEST["Submit"]);

# Process form input data
  if ($reply) {
    if (isset($_REQUEST["Input"])) {
      $input = $_REQUEST["Input"];
    }
  }

# Display form
  print('<form>');
  print('<input type="Text" size="40" maxlength="64"'
   . ' name="Input" value="'.$input.'"/><br/>');
  print('<input type="Submit" name="Submit" value="Submit"/>');
  print('</form>'."\n");

# Display reply
  if ($reply) {
    print('<pre>'."\n");
    print('Content-Type:'."\n");
    print('  text/html; charset=utf-8'."\n");
    print('You have submitted:'."\n");
    print('  Text = '.$input."\n");
    print('  Text in HEX = '.strtoupper(bin2hex($input))."\n");
    print('  Default HEX = '.$input_hex."\n");
    print('</pre>'."\n");
  } 

  print('</body></html>');
?>

After moving this PHP script file to Apache server document directory, I tested it with Internet Explorer (IE) with this URL: http://localhost/Web-Form-Input-Chinese-UTF8.php. I saw a Web page with a form that has the suggested input text and a submit button.

The default input Chinese characters were displayed correctly.

After clicking the submit button, I saw a returning Web page with the same form and a reply section. The Chinese input characters were received by PHP correctly:

Processing Web Form Chinese Input in UTF-8
Processing Web Form Chinese Input in UTF-8

It is interesting to note that the return Web page has a special URL which contains the input text inside the query string. The Chinese characters are included as Hex values of UTF-8 byte sequences:

http://localhost/Web-Form-Input-Chinese-UTF8.php
  ?Input=%E7%94%B5%E8%A7%86%E6%9C%BA%2F%E9%9B%BB%E8%A6%96%E6%A9%9F
  &Submit=Submit

Conclusion: IE handles Chinese input text in UTF-8 encoding correctly. PHP receives Chinese input text in UTF-8 encoding from Web forms correctly.

Table of Contents

 About This Book

 PHP Installation on Windows Systems

 Integrating PHP with Apache Web Server

 charset="*" - Encodings on Chinese Web Pages

 Chinese Characters in PHP String Literals

 Multibyte String Functions in UTF-8 Encoding

 Input Text Data from Web Forms

Input Chinese Text Data from Web Forms

 Steps and Components Involved

Processing Chinese Input on Web Forms in UTF-8

 Processing Chinese Input on Web Forms in GB18030

 Processing Chinese Input on Web Forms in Big5

 Copying and Pasting Chinese Input to UTF-8 Web Forms

 Copying and Pasting Chinese Input to GB18030 Web Forms

 Copying and Pasting Chinese Input to Big5 Web Forms

 MySQL - Installation on Windows

 MySQL - Connecting PHP to Database

 MySQL - Character Set and Encoding

 MySQL - Sending Non-ASCII Text to MySQL

 Retrieving Chinese Text from Database to Web Pages

 Input Chinese Text Data to MySQL Database

 Chinese Text Encoding Conversion and Corruptions

 Archived Tutorials

 References

 Full Version in PDF/EPUB