**C# Tutorials - Herong's Tutorial Examples** - Version 3.30, by Dr. Herong Yang

IEEE 754 Standards - "float" and "double"

This section describes IEEE 754 standards on how floating-point values of 'float' and 'double' are represented in binary format.

In order to answer some of the questions raised in the previous sections, let's take a close look at how real numbers are represented in a computer system. In C#, the IEEE 754 single-precision and double-precision standards are used to represent "float" and "double" type values respectively. We will discuss the storage representation of "decimal" later in this book.

Since the IEEE 754 standards are widely used in the computer industry, there are a lots of publications available talking about these standards. In this section, I will explain only some basic rules of the standards.

**Rule 1**: The binary representation is divided into 3 components
with different number of bits assigned to each components:

Sign Exponent Fraction Total Single-Precision 1 8 23 32 Double-Precision 1 11 52 64

Since the difference between the single-precision and double-precision is only the number of bits used in the exponent and fraction components, we will only use the single-precision in the discussion from now on.

**Rule 2**: The single bit in the sign component represents a sign value (s)
through the binary sign convention:

Sign (s) Bit Pattern Value 0 1 1 -1

**Rule 3**: The 8-bit combination in the exponent component represents
an exponent value (e) through the binary integral number convention:

Exponent (e) Bit Pattern Value 00000001 1 <-- 2^0 00000010 2 <-- 2^1 00000011 3 <-- 2^1 + 2^0 ... ... 11111110 254

Note that the bit patterns of all 0s and all 1s are not list here, because then are reserved for special purposes, which will be discussed later.

**Rule 4**: The 23-bit combination in the fraction component represents
a fraction value through the binary fractional number convention:

Fraction (f) Bit Pattern Value 0000000 00000000 00000000 .0 1000000 00000000 00000000 .5 <-- 2^(-1) 0100000 00000000 00000000 .25 <-- 2^(-2) 1100000 00000000 00000000 .75 <-- 2^(-1) + 2^(-2) ... ... 1111111 11111111 11111111 .?

**Rule 5**: Putting all 3 components together, the single-precision 32-bit pattern
represents a real number (r) through the following expression:

r = s * 1.f * 2^(e-127)

Note that a numbers expressed in this form of expression is called a normalized number, because its binary point has been normalized to right after the leading 1. Also note that the leading 1 is not stored any where.

Now let's look at the range of positive normalized numbers:

s e f r 0 00000001 0000000 00000000 00000000 1 1 .0 -> 1 * 1.0 * 2^(1-127) 0 00000001 1000000 00000000 00000000 1 1 .5 -> 1 * 1.5 * 2^(1-127) ... 0 01111111 0000000 00000000 00000000 1 127 .0 -> 1 * 1.0 * 2^(127-127) 0 01111111 1000000 00000000 00000000 1 127 .5 -> 1 * 1.5 * 2^(127-127) ... 0 11111110 0000000 00000000 00000000 1 254 .0 -> 1 * 1.0 * 2^(254-127) 0 11111110 1000000 00000000 00000000 1 254 .0 -> 1 * 1.5 * 2^(254-127) ... 0 11111110 1111111 11111111 11111111 1 254 .? -> 1 * 1.? * 2^(254-127)

**Rule 6**: When the exponent component stores all 0s, the 3 components are
put together to represent a denormalized number through the following expression:

r = s * 0.f * 2^(1-127)

This expression is designed to extend the range of the normalized numbers a little bit on the 0 side:

s e f r 0 00000000 0000000 00000000 00000000 1 0 .0 -> 1 * 0.0 * 2^(1-127) 0 00000000 1000000 00000000 00000000 1 0 .5 -> 1 * 0.5 * 2^(1-127) ... 0 00000000 1111111 11111111 11111111 1 0 .? -> 1 * 0.? * 2^(1-127)

**Rule 7**: When the exponent component store all 1s, the fraction component
is reserved for arithmetic operation uses.

For negative values, everything is identical with the positive numbers, except the sign bit.

The constant, 127, used in exponent part of the expression is called the bias. For double-precision standard, the bias is 1023.

The IEEE 754 standards are also called binary floating point number standards, because:

- The number is expressed in binary format.
- The binary point is floated around.

*Last update: 2010.*

Table of Contents

Logical Expressions and Conditional Statements

Precision of Floating-Point Data Types

Precision of Floating-Point Data Types - Test

Performance of Floating-Point Data Types

Performance of Floating-Point Data Types - Test

►IEEE 754 Standards - "float" and "double"

IEEE 754 Standards - "float" and "double" - Test

Binary Representation of "decimal"

Accuracy of "decimal" Data Type

Visual C# 2010 Express Edition

C# Compiler and Intermediate Language

Compiling C# Source Code Files

MSBuild - Microsoft Build Engine

System.Diagnostics.FileVersionInfo Class

WPF - Windows Presentation Foundation