Character Counter Program for Any Given Encoding

This section provides a tutorial example on how to write a simple program to count valid characters in a give encoding character set encoding.

As mentioned in the previous chapter, JDK 1.4.1_01 for Windows 2000 supports 48 build-in character set encodings.

Of course, each encoding is designed for a specific character set only. As a simple exercise, I want to write a sample program that counts the number of characters in the character set of a given encoding.

The sample program, EncodingCounter.java, counts the number of code points that are mapped valid byte sequences in the 0x0000 - 0xFFFF range for a given encoding:

/* EncodingCounter.java
 - Copyright (c) 2014, HerongYang.com, All Rights Reserved.
 */
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
class EncodingCounter {
   static char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7',
                             '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
   public static void main(String[] a) {
      String charset = "CP1252";
      if (a.length>0) charset = a[0];
      System.out.println(charset+" encoding:");
      int lastByte = 0;
      int lastLength = 0;
      byte[] startSequence = null;
      char startChar = 0;
      byte[] endSequence = null;
      char endChar = 0;
      boolean isFirstChar = true;
      int validCount = 0;
      int subCount = 0;
      int totalCount = 0x010000;
      for (int i=0; i<totalCount; i++) {
         subCount++;
         char c = (char) i;
         byte[] b = encodeByEncoder(c,charset);         
         int l = 0;
         int lb = 0;
         if (b!=null) {
            l = b.length;
            lb = ((int) b[l-1]) & 0x00FF;
            validCount++;
         }
         if (isFirstChar==true) {
            isFirstChar = false;
            startSequence = b;
            startChar = c;
            lastByte = lb - 1;
            lastLength = l;
         }
         if (!(l==lastLength)) {
            System.out.print(charToHex(startChar)+" >");
            printBytes(startSequence);
            System.out.print(" - "+charToHex(endChar)+" >");
            printBytes(endSequence);
            System.out.println(" = "+(subCount-1));
            startSequence = b;
            startChar = c;
            subCount = 1;
         }
         endSequence = b;
         endChar = c;
         lastLength = l;
         lastByte = lb;
      }
      System.out.print(charToHex(startChar)+" >");
      printBytes(startSequence);
      System.out.print(" - "+charToHex(endChar)+" >");
      printBytes(endSequence);
      System.out.println(" = "+(subCount));
      System.out.println("Total characters = "+totalCount);
      System.out.println("Valid characters = "+validCount);
      System.out.println("Invalid characters = "
         +(totalCount-validCount));
   }
   public static byte[] encodeByEncoder(char c, String cs) {
      Charset cso = null;
      byte[] b = null;
      try {   	
         cso = Charset.forName(cs);
         CharsetEncoder e =  cso.newEncoder();
         e.reset();
         ByteBuffer bb = e.encode(CharBuffer.wrap(new char[] {c}));
         if (bb.limit()>0) b = copyBytes(bb.array(),bb.limit());
      } catch (IllegalCharsetNameException e) {
         System.out.println(e.toString());
      } catch (CharacterCodingException e) {
         // invalid character, return null
      }      	
      return b;
   }
   public static void printBytes(byte[] b) {
      if (b!=null) {
         for (int j=0; j<b.length; j++)
            System.out.print(" "+byteToHex(b[j]));
      } else {
         System.out.print(" XX");
      }
   }   
   public static byte[] copyBytes(byte[] a, int l) {
      byte[] b = new byte[l];
      for (int i=0; i<Math.min(l,a.length); i++) b[i] = a[i];
      return b;
   }
   public static String byteToHex(byte b) {
      char[] a = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
      return new String(a);
   }
   public static String charToHex(char c) {
      byte hi = (byte) (c >>> 8);
      byte lo = (byte) (c & 0xff);
      return byteToHex(hi) + byteToHex(lo);
   }
}

Note that:

The output of this program will be discussed in the sections bellow.

Last update: 2014.

Table of Contents

 About This JDK Tutorial Book

 Downloading and Installing JDK 1.8.0 on Windows

 Downloading and Installing JDK 1.7.0 on Windows

 Downloading and Installing JDK 1.6.2 on Windows

 Java Date-Time API

 Date, Time and Calendar Classes

 Date and Time Object and String Conversion

 Number Object and Numeric String Conversion

 Locales, Localization Methods and Resource Bundles

 Calling and Importing Classes Defined in Unnamed Packages

 HashSet, Vector, HashMap and Collection Classes

 Character Set Encoding Classes and Methods

Character Set Encoding Maps

 Character Set Encoding Map Analyzer

 Character Set Encoding Maps - US-ASCII and ISO-8859-1/Latin 1

 Character Set Encoding Maps - CP1252/Windows-1252

 Character Set Encoding Maps - Unicode UTF-8

 Character Set Encoding Maps - Unicode UTF-16, UTF-16LE, UTF-16BE

Character Counter Program for Any Given Encoding

 Character Set Encoding Comparison

 Encoding Conversion Programs for Encoded Text Files

 Socket Network Communication

 Datagram Network Communication

 DOM (Document Object Model) - API for XML Files

 SAX (Simple API for XML)

 DTD (Document Type Definition) - XML Validation

 XSD (XML Schema Definition) - XML Validation

 XSL (Extensible Stylesheet Language)

 Message Digest Algorithm Implementations in JDK

 Private key and Public Key Pair Generation

 PKCS#8/X.509 Private/Public Encoding Standards

 Digital Signature Algorithm and Sample Program

 "keytool" Commands and "keystore" Files

 KeyStore and Certificate Classes

 Secret Key Generation and Management

 Cipher - Secret Key Encryption and Decryption

 The SSL (Secure Socket Layer) Protocol

 SSL Socket Communication Testing Programs

 SSL Client Authentication

 HTTPS (Hypertext Transfer Protocol Secure)

 Outdated Tutorials

 References

 PDF Printing Version