Managing HTTP Input and Output Encoding

This section provides a tutorial example on how to manage HTTP input and output encoding in 3 different ways, using the same encoding input, output and internal storage, using different encodings and managing conversion manually, or using different encodings and managing conversion automatically.

There are 3 approaches on how to manage HTTP input and output encodings:

1. Set HTTP input encoding, HTTP output encoding and PHP script internal encoding to be exactly the same, like UTF-8 or GB2312. I am strongly recommending this approach, since it avoids the need for conversion when receiving HTTP input and generating HTTP output.

2. Set HTTP input encoding and HTTP output encoding to be the same, and PHP script internal encoding to be a different one. But do not let the PHP engine to do automated conversion on HTTP input and output. Let your PHP script manage it explicitly.

3. Set HTTP input encoding and HTTP output encoding to be the same, and PHP script internal encoding to be a different one. But let the PHP engine do automated conversion on HTTP input and output.

For approach #1, you need turn off HTTP input and output encoding conversion by these php.ini settings:

mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.http_input = pass
mbstring.http_output = pass
mbstring.encoding_translation = Off

While writing your script, you must always remember that you are dealing with UTF-8 encoded strings.

Approach #2 is useful, if you want your Web page to be GB2312 encoded while using UTF-8 as your script internal encoding, and you want your script to control the HTTP input and output conversion process. Here are the php.ini settings:

mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.http_input = pass
mbstring.http_output = pass
mbstring.encoding_translation = Off

Approach #3 is useful, if you want your Web page to be UTF-8 encoded while using UTF-16 as your script internal encoding, and you trust the PHP engine to do HTTP input and output encoding conversion. Here are the php.ini settings:

mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.http_input = GB2312
mbstring.http_output = GB2312
mbstring.encoding_translation = On

Since approach #2 is more challenging than the others, I wrote the following script to give you some ideas:

<?php # MbStringHttp.php
#- Copyright (c) 2015, HerongYang.com, All Rights Reserved.
#
   mb_internal_encoding("UTF-8");

#- Taking care of HTTP input conversion
   $myRequest['English'] = "";
   $myRequest['ChineseUtf8'] = "";
   $myRequest['ChineseGb2312'] = "";
   foreach ($_REQUEST as $k => $v) {
      $myRequest[$k] = mb_convert_encoding($v,"UTF-8", "GB2312");
   }
   $r_English = $myRequest['English'];
   $r_ChineseUtf8 = $myRequest['ChineseUtf8'];
   $r_ChineseGb2312 = $myRequest['ChineseGb2312'];

#- Taking care of HTTP output conversion
   mb_http_output("GB2312");
   ob_start("mb_output_handler");

#- Generating HTML document
   print("<html>");
   print('<meta http-equiv="Content-Type"'
      .' content="text/html; charset=gb2312"/>');
   print("<body>\n");
   print("<form action=MbStringHttp.php method=get>");
   print("English ASCII: <input name=English"
      ." value='$r_English' size=16><br>\n");
   print("Chinese UTF-8: <input name=ChineseUtf8"
      ." value='$r_ChineseUtf8' size=16><br>\n");
   print("Chinese GB2312: <input name=ChineseGb2312"
      ." value='$r_ChineseGb2312' size=16><br>\n");
   print("<input type=submit name=submit value=Submit>\n");
   print("</form>\n");

#- Outputing input strings back to HTML document
   print("<hr>");
   print("<pre>");
   print("{$myRequest['English']}\n");
   print("{$myRequest['ChineseUtf8']}\n");
   print("{$myRequest['ChineseGb2312']}\n");
   print("</pre>");
   print("</body>");
   print("</html>");

#- Dumping input strings to a file
   $file = fopen("\\temp\\MbStringHttp.txt", 'ab');
   $str = "--- Query String ---\n";
   fwrite($file, $str, strlen($str));
   if (array_key_exists('QUERY_STRING',$_SERVER)) {
      $str = $_SERVER['QUERY_STRING'];
   } else {
      $str = NULL;
   }
   fwrite($file, $str, strlen($str));

   $str = "--- Raw reqeust input ---\n";
   fwrite($file, $str, strlen($str));
   foreach ($_REQUEST as $k => $v) {
      $str = "$k = ($v)\n";
      fwrite($file, $str, strlen($str));
   }

   $str = "--- Converted reqeust input ---\n";
   fwrite($file, $str, strlen($str));
   foreach ($myRequest as $k => $v) {
      $str = "$k = ($v)\n";
      fwrite($file, $str, strlen($str));
   }
   fclose($file);
?>

I tested this script with IE, and entered the following strings:

English ASCII: Hello world!
Spanish UTF-8: ¡Hola mundo!
Korean UTF-8: 여보세요 세계 !
Chinese UTF-8: 你好世界!
Chinese GB2312: ????¡

The returning page showed input strings correctly. But the source code was very interesting:

<html><meta http-equiv="Content-Type" content="text/html;
   charset=gb2312"/><body>
<form action=MbStringHttp.php method=get>
English ASCII: <input name=English value='Hello world!' size=16><br>
Spanish UTF-8: <input name=Spanish value='&iexcl;Hola mundo!'
   size=16><br>
Korean UTF-8: <input name=Korean value='&#50668;&#48372;&#49464;
   &#50836; &#49464;&#44228; !' size=16><br>
Chinese UTF-8: <input name=ChineseUtf8 value='ÄãºÃÊÀ½ç!' size=16><br>
Chinese GB2312: <input name=ChineseGb2312 value='&Ecirc;&Agrave;
   &frac12;&ccedil;&Auml;&atilde;&ordm;&Atilde;&pound;&iexcl;'
   size=16><br>
<input type=submit name=submit value=Submit>
</form>
<hr><pre>Hello world!
&iexcl;Hola mundo!
&#50668;&#48372;&#49464;&#50836; &#49464;&#44228; !
ÄãºÃÊÀ½ç!
&Ecirc;&Agrave;&frac12;&ccedil;&Auml;&atilde;&ordm;&Atilde;
   &pound;&iexcl;
</pre></body></html>

I looked at the dump file, \temp\MbStringHttp.txt:

--- Query String ---
English=Hello+world%21&
Spanish=%26iexcl%3BHola+mundo%21&
Korean=%26%2350668%3B%26%2348372%3B%26%2349464%3B%26%2350836%3B+
   %26%2349464%3B%26%2344228%3B+%21&
ChineseUtf8=%C4%E3%BA%C3%CA%C0%BD%E7%21&
ChineseGb2312=%26Ecirc%3B%26Agrave%3B%26frac12%3B%26ccedil%3B
   %26Auml%3B%26atilde%3B%26ordm%3B%26Atilde%3B%26pound%3B%26iexcl%3B&
submit=Submit
--- Raw reqeust input ---
English = (Hello world!)
Spanish = (&iexcl;Hola mundo!)
Korean = (&#50668;&#48372;&#49464;&#50836; &#49464;&#44228; !)
ChineseUtf8 = (ÄãºÃÊÀ½ç!)
ChineseGb2312 = (&Ecirc;&Agrave;&frac12;&ccedil;&Auml;&atilde;&ordm;
   &Atilde;&pound;&iexcl;)
submit = (Submit)
--- Converted reqeust input ---
English = (Hello world!)
Spanish = (&iexcl;Hola mundo!)
Korean = (&#50668;&#48372;&#49464;&#50836; &#49464;&#44228; !)
ChineseUtf8 = (你好世界!)
ChineseGb2312 = (&Ecirc;&Agrave;&frac12;&ccedil;&Auml;&atilde;&ordm;
   &Atilde;&pound;&iexcl;)
submit = (Submit)

My script handled HTTP input and output encoding correctly, if the input strings are recorded in GB2312 by the Web browser. For other characters recorded as HTML entities, you need to avoid them by telling your users to enter data correctly.

Last update: 2015.

Table of Contents

 About This Book

 Introduction and Installation of PHP 7.0

 PHP Script File Syntax

 PHP Data Types and Data Literals

 Variables, References, and Constants

 Expressions, Operations and Type Conversions

 Conditional Statements - "if" and "switch"

 Loop Statements - "while", "for", and "do ... while"

 Function Declaration, Arguments, and Return Values

 Arrays - Ordered Maps

 Configuring and Sending out Emails

 Retrieving Information from HTTP Requests

 Creating and Managing Sessions in PHP Scripts

 Sending and Receiving Cookies in PHP Scripts

 Controlling HTTP Response Header Lines in PHP Scripts

 MySQL Server Connection and Access Functions

 Functions to Manage Directories, Files and Images

 SOAP Extension Function and Calling Web Services

 SOAP Server Functions and Examples

 Localization Overview of Web Applications

 Using Non-ASCII Characters in HTML Documents

 Using Non-ASCII Characters as PHP Script String Literals

 Receiving Non-ASCII Characters from Input Forms

"mbstring" Extension and Non-ASCII Encoding Management

 "mbstring" - Multi-Byte String Extension

 mb_convert_encoding() and Other mbstring Functions

 Examples of Using "mbstring" Functions

Managing HTTP Input and Output Encoding

 Managing Non-ASCII Character Strings with MySQL Servers

 Introduction of Class and Object

 Integrating PHP with Apache Web Server

 Outdated Tutorials

 References

 PDF Printing Version