Next: About this document Up: Floating Point Arithmetic Previous: Division

Rounding Errors

In integer arithmetic, the result of an operation is well-defined: either the exact result is obtained or overflow occurs and the result cannot be represented.
In floating point arithmetic, rounding errors occur as a result of the limited precision of the mantissa. For example, consider the average of two floating point numbers with identical exponents, but mantissas which differ by 1. The average should be a number midway between the original numbers, but the average cannot be represented without increasing the size of the mantissa. Although the mathematical operation is well-defined and the result is within the range of representable numbers, the average of two adjacent floating point values cannot be represented exactly.
The IEEE FPS defines four rounding rules for choosing the closest floating point when a rounding error occurs:

RN
Round to Nearest. Break ties by choosing the least significant bit = 0.
RZ
Round toward Zero. Same as truncation in sign-magnitude.
RP
Round toward Positive infinity.
RM
Round toward Minus infinity. Same as truncation in 2's complement.

RN is generally preferred and introduces less systematic error than the other rules.
The absolute error introduced by rounding is the actual difference between the exact value and the floating point representation. The size of the absolute error is proportional to the magnitude of the number. For numbers in IEEE FPS format, the absolute error is less than

The largest absolute rounding error occurs when the exponent is 127 and is approximately since

The relative error is the absolute error divided by the magnitude of the number which is approximated. For normalized floating point numbers, the relative error is approximately since

For denormalized numbers (E = 0), relative errors increase as the magnitude of the number decreases toward zero. However, the absolute error of a denormalized number is less than since the truncation error in a denormalized number is

Rounding errors affect the outcome of floating point computations in several ways:

Exact comparison of floating point variables often produces incorrect results. Floating variables should not be used as loop counters or loop increments. Convergence tests for iterative algorithms are limited by the precision of the floating point computations.
Operations performed in different orders may give different results. On many computers, a+b may differ from b+a and (a+b)+c may differ from a+(b+c).
Errors accumulate over time. While the relative error for a single operation in single precision floating point is about , algorithms which iterate many times may experience an accumulation of errors which is much larger.

Next: About this document Up: Floating Point Arithmetic Previous: Division

CS 301 Class Account
Mon Oct 20 22:47:46 ADT 1997