These notes provide an electronic web-textbook for review of what is covered in my class lectures. The Chapter 2 problems I assign, particularly the additional problems that I have created, illustrate applications of this material. © Copyright 1996.
Another suitable reference on the storage of floating-point numbers according to the IEEE standard is Section 4.4 [page 103] of A Programmer's View of Computer Architecture by Goodman and Miller, the current textbook for COP-3400.
We will work with normalized floating point numbers of the form x = sign * 1.fffff * 2^p.
Most examples and problems will use a generalization of the formal IEEE 754 specification that can be applied to a word size of some arbitrary (but small) number of bits. This uses a sign bit, a biased exponent stored in Ne bits, and Nf bits for the fractional part of a normalized and rounded mantissa.
Single precision (32-bit) floating-point numbers use Ne = 8 and Nf = 23, which gives about 7 digit precision and allows storage of numbers whose magnitudes range from roughly 10^{-38} to about 10^{38}.
The IEEE 754 standard uses a special representation for zero and also has a representation for +Inf and -Inf that is used when numbers overflow as well as one for NaN that is used when illegal operations (like 0/0) are encountered.
Double precision (64-bit) floating-point numbers use Ne = 11 and Nf = 52, which gives about 15 digit precision and allows storage of numbers whose magnitudes range from roughly 10^{-308} to about 10^{308}.
This material is © Copyright 1996, by James Carr. FSU students enrolled in MAD-3401 have permission to make personal copies of this document for use when studying. Other academic users may link to this page but may not copy or redistribute the material without the author's permission.
Return to the Home Page for MAD-3401.