Once the decimal points are aligned, the addition can be performed by ignoring the decimal point and using integer addition.
The addition of two IEEE FPS numbers is performed in a similar manner. The number 2.25 in IEEE FPS is:
The number 134.0625 in IEEE FPS is:
The exponents can be positive or negative with no change in the algorithm. A smaller exponent means more negative. In the bias-127 representation, the smaller exponent has the smaller value for E, the unsigned interpretation.
An important case occurs when the numbers differ widely in magnitude. If the exponents differ by more than 24, the smaller number will be shifted right entirely out of the mantissa field, producing a zero mantissa. The sum will then equal the larger number. Such truncation errors occur when the numbers differ by a factor of more than , which is approximately . The precision of IEEE single precision floating point arithmetic is approximately 7 decimal digits.
Negative mantissas are handled by first converting to 2's complement and then performing the addition. After the addition is performed, the result is converted back to sign-magnitude form.
When adding numbers of opposite sign, cancellation may occur, resulting in a sum which is arbitrarily small, or even zero if the numbers are equal in magnitude. Normalization in this case may require shifting by the total number of bits in the mantissa, resulting in a large loss of accuracy.
When the mantissa of the sum is zero, no amount of shifting will produce a 1 in the hidden bit. This case must be detected in the normalization step and the result set to the representation for 0, E = M = 0. This result does not mean the numbers are equal; only that their difference is smaller than the precision of the floating point representation.
Floating point subtraction is achieved simply by inverting the sign bit and performing addition of signed mantissas as outlined above.