In floating point arithmetic, rounding errors occur as a result of the limited precision of the mantissa. For example, consider the average of two floating point numbers with identical exponents, but mantissas which differ by 1. The average should be a number midway between the original numbers, but the average cannot be represented without increasing the size of the mantissa. Although the mathematical operation is well-defined and the result is within the range of representable numbers, the average of two adjacent floating point values cannot be represented exactly.
The IEEE FPS defines four rounding rules for choosing the closest floating point when a rounding error occurs:
RN is generally preferred and introduces less systematic error than the other rules.
The absolute error introduced by rounding is the actual difference between the exact value and the floating point representation. The size of the absolute error is proportional to the magnitude of the number. For numbers in IEEE FPS format, the absolute error is less than
The largest absolute rounding error occurs
when the exponent is 127 and is approximately since
The relative error is the absolute error divided by the magnitude
of the number which is approximated. For normalized floating point
numbers, the relative error is approximately since
For denormalized numbers (E = 0), relative errors increase as the magnitude
of the number decreases toward zero. However, the absolute error of a
denormalized number is less than since the truncation
error in a denormalized number is
Rounding errors affect the outcome of floating point computations in several ways: