MAD 3401 IEEE Notes - Sect. 5 Double precision (64-bit) representation The 64-bit (double precision) IEEE floating-point representation is laid out as

s eeeeeeeeeee ffff ffffffff....ffffffff

in the IEEE 754 standard. There are Ne = 11 bits in the exponent field and Nf = 52 bits for the fractional part of the mantissa.

We store bias+p in the exponent field; the bias is 01111111111 (binary) = 3FF (hex) = 1023 (decimal)

To allow for the representation of special values (0,Inf, NaN) as described in section #4, two bit patterns are reserved thus limiting the power p to the range [-1022,1023].

Since the mantissa has a total of 53 bits (when you count the hidden bit) and is rounded, the magnitude of the relative error in a number is bounded by 2^{-53} = 1.11... x 10^{-16}.
This means we get almost 16 decimal digit precision.
(The largest possible mantissa is M = 2^{53} = 9.007...x10^15, which has 15+ digits of precision.)

The largest positive number that can be stored is
1.11111....11111 x 2^{1023} = 1.797693... x 10^{308}.
Notice that 1.11111....11111 = 2 - 2^{-52}.
Also note that log_{10}(largest) = 308.2547...

The smallest positive number is
1.00000...00000 x 2^{-1022} = 2.225074... x 10^{-308}.
Note that log_{10}(smallest) = -307.6526...

Notice:
When we go to double precision using the IEEE 754 floating-point standard, we gain more than a factor of 2 in the precision of the mantissa and we gain a huge factor in the size of numbers we can work with before encountering an overflow condition. This material is © Copyright 1996, by James Carr. FSU students enrolled in MAD-3401 have permission to make personal copies of this document for use when studying. Other academic users may link to this page but may not copy or redistribute the material without the author's permission.