Floating Point Operations - Part II

Floating Point Operations - Part II

Multiplication • Do unsigned multiplication on the mantissas including the hidden bits • Add the true exponents or unbias one of the exponents (subtract 127 from it) then perform 2’s complement addition • Normalize the result • Set the sign bit of the result

Setting the Sign bit The following table gives the sign bit of the result:

mantissas already normalized exponents Example 12.5 least 16 bits eliminated 18.0 x 9.5 10010000 0 1000 0011 (1)001 0000 x 10011000 0 1000 0010 (1)001 1000 10010000 0 1000 0110(1)010 1011 10010000 10010000 101010110000000 1000 0011 0000 0011 1000 0110 14 bits Unbias (subtract 127) one of exponent, then perform 2’s complement addition

Division • Do unsigned division of the mantissas • Subtract the exponent of the divisor from the exponent of the dividend • Normalize the result • Set the sign bit of the result

Setting the sign bit of the quotient The sign bit of the quotient is set using the following table:

Rounding • In floating point operations, some results may not be representable. • There is always a small amount of error incurred during rounding. • Error tend to accumulate over time • Operations performed in a different order might give different results • Exact comparison of two floating point variables is infeasible

Floating point addition is not associative. Example 13.1 Suppose x = -1.510 x 1038, y = 1.510 x 1038 and z = 1.0 and suppose these are single-precision numbers. x+(y+z)= -1.510 x 1038 +(1.510 x 1038 + 1.0) = -1.510 x 1038 + 1.510 x 1038 = 0.0 (x+y)+z= (-1.510 x 1038 + 1.510 x 1038) + 1.0 = 0 + 1.0 = 1.0

Rounding Rules • Round to nearest. Same as taught in school. In case of tie, if the lsb is 1 add a 1; if the lsb is a 0 truncate. The lsb is always 0. • Round toward zero. Truncate the magnitude to the correct number of bits. • Round toward positive infinity. The least positive value representable that is not arithmetically less than the unrounded value is chosen. • Round toward negative infinity. The least negative value representable but not arithmetically greater than the unrounded value is chosen.

Overflow • Overflow occurs when the exponent of the normalized result is outside the range of values representable • The smallest number that can be represented normally has an exponent of e = -126, i.e. E = 1 = 0000 0001 and the largest number has an exponent ofe = 127, i.e. E = 254 = 1111 1110

The IEEE FPS assigns special meaning for extreme values of the exponent • -¥ (S=1,E=255,F=0) • +¥ (S=0,E=255,F=0) • NaN (E=255,F ¹ 0) • 0 (E=0,F=0)

Underflow • Underflow occurs when the result is too close to zero to be represented • Repeatedly dividing a number by a positive constant results in values that will approach zero but may never be zero, e.g. 1 divide by 10 repetitively • In these cases, floating point operations after some iteration will eventually return zero

Until underflow occurs, the computation is reversible, i.e. if we multiply the current result by the constant the same number of times we have divided it, it will return the original number • Once, underflow occurs any number of multiplication will still produce zero

Floating Point Operations - Part II

Floating Point Operations - Part II

Presentation Transcript

Floating Point

Floating point

Floating Point

The Floating Book Part II

IA32 Floating Point

Floating-Point Operations

Floating Point

Floating Point

Floating Point

Floating Point

Floating Point

Floating point

Floating Point

Floating point

Floating Point

Floating Point

Floating Point Arithmetic – Part I

Floating Point

Floating Point

Floating Point

CS61C - Machine Structures Lecture 10 - Floating Point, Part II and Miscellaneous