Unicode Tutorials - Herong's Tutorial Examples - v5.32, by Herong Yang
Python Source Code Encoding
This section provides a tutorial example to demonstrate the default UTF-8 encoding of Python source code file and ways to change the default encoding.
When using Unicode characters in Python source code, you need to remember that:
Here is a Python script that generates 4 Python source code files for different encoding test:
# Source-Code-Encoding.py # Copyright 2019 (c) HerongYang.com. All Rights Reserved. # import os fd = os.open('Source-Code-Default.py', os.O_CREAT|os.O_WRONLY) os.write(fd, b"print('Fran\xc3\xa7ais')") os.close(fd) fd = os.open('Source-Code-Latin-1.py', os.O_CREAT|os.O_WRONLY) os.write(fd, b"# coding: latin-1\n") os.write(fd, b"print('Fran\xe7ais')") os.close(fd) fd = os.open('Source-Code-Wrong-1.py', os.O_CREAT|os.O_WRONLY) os.write(fd, b"print('Fran\xe7ais')") os.close(fd) fd = os.open('Source-Code-Wrong-2.py', os.O_CREAT|os.O_WRONLY) os.write(fd, b"# coding=iso-8859-10\n") os.write(fd, b"print('Fran\xe7ais')") os.close(fd)
Run the above script, it will generate 4 Python source code files as shown below:
herong$ python3 --version Python 3.8.0 herong$ python3 Source-Code-Encoding.py herong$ ls -l -rwxr-xr-x 1 herong staff 18 Source-Code-Default.py -rwxr-xr-x 1 herong staff 35 Source-Code-Latin-1.py -rwxr-xr-x 1 herong staff 17 Source-Code-Wrong-1.py -rwxr-xr-x 1 herong staff 38 Source-Code-Wrong-2.py
Source-Code-Default.py uses the default source code encoding and contains a UTF 8 encoded character of \xc3\xa7. If you run it, you will see:
herong$ python3 Source-Code-Default.py Français
Source-Code-Latin-1.py specifies the source encoding as "latin-1" and contains a Latin-1 encoded character of \xe7. If you run it, you will see the same result:
herong$ python3 Source-Code-Latin-1.py Français
Source-Code-Wrong-2.py uses the default source code encoding and contains a Latin-1 encoded character of \xe7. If you run it, you will see an error message, because byte \xe7 is not compatible with UTF-8 encoding:
herong$ python3 Source-Code-Wrong-1.py File "Source-Code-Wrong-1.py", line 1 SyntaxError: Non-UTF-8 code starting with '\xe7' in file Source-Code-Wrong-1.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Source-Code-Wrong-1.py specifies the source encoding as "iso-8859-10" and contains a Latin-1 encoded character of \xe7. If you run it, you will see no errors. But it prints out a wrong character, because that byte \xe7 represents a different character in ISO-8859-10 than Latin-1:
herong$ python3 Source-Code-Wrong-2.py
Franįais
Table of Contents
ASCII Character Set and Encoding
GB2312 Character Set and Encoding
GB18030 Character Set and Encoding
JIS X0208 Character Set and Encodings
UTF-8 (Unicode Transformation Format - 8-Bit)
UTF-16, UTF-16BE and UTF-16LE Encodings
UTF-32, UTF-32BE and UTF-32LE Encodings
►Python Language and Unicode Characters
Summary of Unicode Support in Python
Unicode Support on "str" Data Type
Unicode Character Encoding and Decoding
"unicodedata" Module for Unicode Properties
Java Language and Unicode Characters
Encoding Conversion Programs for Encoded Text Files
Using Notepad as a Unicode Text Editor
Using Microsoft Word as a Unicode Text Editor