Viewing Encoded Text Files in Web Browsers

This section provides a tutorial example on how to view text files with different encodings with Web browser Internet Explorer. The encoded text file should be modified to add proper HTML tags using the sample program EncodingHtml.java.

Now, we have our greeting messages saved in many different encodings. The next question is how do display them as glyph of the corresponding languages on the screen. One of the ways I have used in the past is to run a multi-language enabled Web browser like IE to view the text files. To do this, we have to mark up the text into a html file, by using a program like this one:

/**
 * EncodingHtml.java
 - Copyright (c) 2009, HerongYang.com, All Rights Reserved.
 * 
 * This program allows you to mark up a text file into html file.
 */
import java.io.*;
import java.util.*;
class EncodingHtml {
   static HashMap charsetMap = new HashMap();
   public static void main(String[] a) {
      String inFile = a[0];
      String inCharsetName = a[1];
      String outFile = inFile + ".html";
      try {
         InputStreamReader in = new InputStreamReader(
            new FileInputStream(inFile), inCharsetName);
         OutputStreamWriter out = new OutputStreamWriter(
            new FileOutputStream(outFile), inCharsetName);
         writeHead(out, inCharsetName);         
         int c = in.read();
         int n = 0;
         while (c!=-1) {
            out.write(c);
            n++;
            c = in.read();
         }
         writeTail(out);
         in.close();
         out.close();
         System.out.println("Number of characters: "+n);
      } catch (IOException e) {
         System.out.println(e.toString());
      }
   }
   public static void writeHead(OutputStreamWriter out, String cs)
      throws IOException {
      out.write("<html><head>\n");
      out.write("<meta http-equiv=\"Content-Type\""+
         " content=\"text/html; charset="+cs+"\">\n");
      out.write("</head><body><pre>");
   }
   public static void writeTail(OutputStreamWriter out) 
      throws IOException {
      out.write("</pre></body></html>\n");
   }
}

Now, let's compile this program and run it with hello.utf-8:

C:\herong>javac EncodingHtml.java

C:\herong>java EncodingHtml hello.utf-8 utf-8
Number of characters: 84

If you have installed IE with the Chinese language support, you should be able to open the output file, hello.utf-8.html, and enjoy reading the messages in English, Simplified Chinese, and Traditional Chinese.

Then, run EncodingHtml.java with other encodings,

C:\herong>java EncodingHtml hello.gbk gbk
Number of characters: 84

C:\herong>java EncodingHtml hello.big5 big5
Number of characters: 84

C:\herong>java EncodingHtml hello.shift_jis shift_jis
Number of characters: 84

View the output files with IE, and compare the results:

If you manually change the setting of View/Encoding, IE will not be able to show the message with the right glyph.

Last update: 2009.

Table of Contents

 About This Book

 Character Sets and Encodings

 ASCII Character Set and Encoding

 GB2312 Character Set and Encoding

 GB18030 Character Set and Encoding

 JIS X0208 Character Set and Encodings

 Unicode Character Set

 UTF-8 (Unicode Transformation Format - 8-Bit)

 UTF-16, UTF-16BE and UTF-16LE Encodings

 UTF-32, UTF-32BE and UTF-32LE Encodings

 Java Language and Unicode Characters

 Character Encoding in Java

 Character Set Encoding Maps

Encoding Conversion Programs for Encoded Text Files

 \uxxxx - Entering Unicode Data in Java Programs

 HexWriter.java - Converting Encoded Byte Sequences to Hex Values

 EncodingConverter.java - Encoding Conversion Sample Program

Viewing Encoded Text Files in Web Browsers

 Unicode Signs in Different Encodings

 Using Notepad as a Unicode Text Editor

 Using Microsoft Word as a Unicode Text Editor

 Using Microsoft Excel as a Unicode Text Editor

 Unicode Fonts

 Unicode Code Point Blocks - Code Charts

 Outdated Tutorials

 References

 PDF Printing Version