PHP Tutorials - Herong's Tutorial Examples - Version 4.20, by Dr. Herong Yang
Managing HTTP Input and Output Encoding
This section provides a tutorial example on how to manage HTTP input and output encoding in 3 different ways, using the same encoding input, output and internal storage, using different encodings and managing conversion manually, or using different encodings and managing conversion automatically.
There are 3 approaches on how to manage HTTP input and output encodings:
1. Set HTTP input encoding, HTTP output encoding and PHP script internal encoding to be exactly the same, like UTF-8 or GB2312. I am strongly recommending this approach, since it avoids the need for conversion when receiving HTTP input and generating HTTP output.
2. Set HTTP input encoding and HTTP output encoding to be the same, and PHP script internal encoding to be a different one. But do not let the PHP engine to do automated conversion on HTTP input and output. Let your PHP script manage it explicitly.
3. Set HTTP input encoding and HTTP output encoding to be the same, and PHP script internal encoding to be a different one. But let the PHP engine do automated conversion on HTTP input and output.
For approach #1, you need turn off HTTP input and output encoding conversion by these php.ini settings:
mbstring.language = Neutral mbstring.internal_encoding = UTF-8 mbstring.http_input = pass mbstring.http_output = pass mbstring.encoding_translation = Off
While writing your script, you must always remember that you are dealing with UTF-8 encoded strings.
Approach #2 is useful, if you want your Web page to be GB2312 encoded while using UTF-8 as your script internal encoding, and you want your script to control the HTTP input and output conversion process. Here are the php.ini settings:
mbstring.language = Neutral mbstring.internal_encoding = UTF-8 mbstring.http_input = pass mbstring.http_output = pass mbstring.encoding_translation = Off
Approach #3 is useful, if you want your Web page to be UTF-8 encoded while using UTF-16 as your script internal encoding, and you trust the PHP engine to do HTTP input and output encoding conversion. Here are the php.ini settings:
mbstring.language = Neutral mbstring.internal_encoding = UTF-8 mbstring.http_input = GB2312 mbstring.http_output = GB2312 mbstring.encoding_translation = On
Since approach #2 is more challenging than the others, I wrote the following script to give you some ideas:
<?php # MbStringHttp.php #- Copyright (c) 2007-2019, HerongYang.com, All Rights Reserved. # mb_internal_encoding("UTF-8"); #- Taking care of HTTP input conversion $myRequest['English'] = ""; $myRequest['ChineseUtf8'] = ""; $myRequest['ChineseGb2312'] = ""; foreach ($_REQUEST as $k => $v) { $myRequest[$k] = mb_convert_encoding($v,"UTF-8", "GB2312"); } $r_English = $myRequest['English']; $r_ChineseUtf8 = $myRequest['ChineseUtf8']; $r_ChineseGb2312 = $myRequest['ChineseGb2312']; #- Taking care of HTTP output conversion mb_http_output("GB2312"); ob_start("mb_output_handler"); #- Generating HTML document print("<html>"); print('<meta http-equiv="Content-Type"' .' content="text/html; charset=gb2312"/>'); print("<body>\n"); print("<form action=MbStringHttp.php method=get>"); print("English ASCII: <input name=English" ." value='$r_English' size=16><br>\n"); print("Chinese UTF-8: <input name=ChineseUtf8" ." value='$r_ChineseUtf8' size=16><br>\n"); print("Chinese GB2312: <input name=ChineseGb2312" ." value='$r_ChineseGb2312' size=16><br>\n"); print("<input type=submit name=submit value=Submit>\n"); print("</form>\n"); #- Outputing input strings back to HTML document print("<hr>"); print("<pre>"); print("{$myRequest['English']}\n"); print("{$myRequest['ChineseUtf8']}\n"); print("{$myRequest['ChineseGb2312']}\n"); print("</pre>"); print("</body>"); print("</html>"); #- Dumping input strings to a file $file = fopen("\\temp\\MbStringHttp.txt", 'ab'); $str = "--- Query String ---\n"; fwrite($file, $str, strlen($str)); if (array_key_exists('QUERY_STRING',$_SERVER)) { $str = $_SERVER['QUERY_STRING']; } else { $str = NULL; } fwrite($file, $str, strlen($str)); $str = "--- Raw reqeust input ---\n"; fwrite($file, $str, strlen($str)); foreach ($_REQUEST as $k => $v) { $str = "$k = ($v)\n"; fwrite($file, $str, strlen($str)); } $str = "--- Converted reqeust input ---\n"; fwrite($file, $str, strlen($str)); foreach ($myRequest as $k => $v) { $str = "$k = ($v)\n"; fwrite($file, $str, strlen($str)); } fclose($file); ?>
I tested this script with IE, and entered the following strings:
English ASCII: Hello world! Spanish UTF-8: ¡Hola mundo! Korean UTF-8: 여보세요 세계 ! Chinese UTF-8: 你好世界! Chinese GB2312: ????¡
The returning page showed input strings correctly. But the source code was very interesting:
<html><meta http-equiv="Content-Type" content="text/html; charset=gb2312"/><body> <form action=MbStringHttp.php method=get> English ASCII: <input name=English value='Hello world!' size=16><br> Spanish UTF-8: <input name=Spanish value='¡Hola mundo!' size=16><br> Korean UTF-8: <input name=Korean value='여보세 요 세계 !' size=16><br> Chinese UTF-8: <input name=ChineseUtf8 value='ÄãºÃÊÀ½ç!' size=16><br> Chinese GB2312: <input name=ChineseGb2312 value='ÊÀ ½çÄãºÃ£¡' size=16><br> <input type=submit name=submit value=Submit> </form> <hr><pre>Hello world! ¡Hola mundo! 여보세요 세계 ! ÄãºÃÊÀ½ç! ÊÀ½çÄãºÃ £¡ </pre></body></html>
I looked at the dump file, \temp\MbStringHttp.txt:
--- Query String --- English=Hello+world%21& Spanish=%26iexcl%3BHola+mundo%21& Korean=%26%2350668%3B%26%2348372%3B%26%2349464%3B%26%2350836%3B+ %26%2349464%3B%26%2344228%3B+%21& ChineseUtf8=%C4%E3%BA%C3%CA%C0%BD%E7%21& ChineseGb2312=%26Ecirc%3B%26Agrave%3B%26frac12%3B%26ccedil%3B %26Auml%3B%26atilde%3B%26ordm%3B%26Atilde%3B%26pound%3B%26iexcl%3B& submit=Submit --- Raw reqeust input --- English = (Hello world!) Spanish = (¡Hola mundo!) Korean = (여보세요 세계 !) ChineseUtf8 = (ÄãºÃÊÀ½ç!) ChineseGb2312 = (ÊÀ½çÄ㺠ã¡) submit = (Submit) --- Converted reqeust input --- English = (Hello world!) Spanish = (¡Hola mundo!) Korean = (여보세요 세계 !) ChineseUtf8 = (ä½ å¥½ä¸–ç•Œ!) ChineseGb2312 = (ÊÀ½çÄ㺠ã¡) submit = (Submit)
My script handled HTTP input and output encoding correctly, if the input strings are recorded in GB2312 by the Web browser. For other characters recorded as HTML entities, you need to avoid them by telling your users to enter data correctly.
Last update: 2019.
Table of Contents
Introduction and Installation of PHP 7.3
PHP Data Types and Data Literals
Variables, References, and Constants
Expressions, Operations and Type Conversions
Conditional Statements - "if" and "switch"
Loop Statements - "while", "for", and "do ... while"
Function Declaration, Arguments, and Return Values
Introduction of Class and Object
Integrating PHP with Apache Web Server
Retrieving Information from HTTP Requests
Creating and Managing Sessions in PHP Scripts
Sending and Receiving Cookies in PHP Scripts
Controlling HTTP Response Header Lines in PHP Scripts
MySQL Server Connection and Access Functions
Functions to Manage Directories, Files and Images
SOAP Extension Function and Calling Web Services
SOAP Server Functions and Examples
Localization Overview of Web Applications
Using Non-ASCII Characters in HTML Documents
Using Non-ASCII Characters as PHP Script String Literals
Receiving Non-ASCII Characters from Input Forms
►"mbstring" Extension and Non-ASCII Encoding Management
"mbstring" - Multi-Byte String Extension
mb_convert_encoding() and Other mbstring Functions
Examples of Using "mbstring" Functions
►Managing HTTP Input and Output Encoding
Managing Non-ASCII Character Strings with MySQL Servers