100 likes | 419 Views
Floating Point. Number system corresponding to the decimal notation 1,837 * 10 significand exponent A great number of corresponding binary standards exists. There is one common standard: IEEE 754-1985 (IEC 559). 4. IEEE 754-1985. Number representations: Single precision (32 bits)
E N D
Floating Point Number system corresponding to the decimal notation 1,837 * 10 significandexponent A great number of corresponding binary standards exists. There is one common standard: IEEE 754-1985 (IEC 559) 4
IEEE 754-1985 • Number representations: • Single precision (32 bits) sign: 1 bit exponent: 8 bits fraction: 23 bits • Double precision (64 bits) sign: 1 bit exponent: 11 bits fraction: 52 bits
Single Precision Format 1 8 23 Sign S S E M Exponent E: excess 127 binary integer Mantissa M: normalized binary significand w/ hidden integer bit: 1.M Excess 127; actual exponent is e = E - 127 N = (-1)S * (1.M [bit-string])*2e
Example 1 S E M 1 01111110 10000000000000000000000 e = E - 127 e = 126 - 127 = -1 N = (-1)1 * (1.1 [bit-string]) *2-1 N = -1 * 0.11 [bit-string] N = -1 * (2-1 *1 + 2 -2 *1) N = -1 * (0.5*1 + 0.25*1) = -0.75
Single Precision Range Magnitude of numbers that can be represented is in the range: 2-126 *(1.0) to 2127 *(2-223) which is approximately: 1.8*10-38 to 3.4*1038
IEEE 754-1985 • Single Precision (32 bits) • Fraction part: 23 bits; 0x < 1 • Significand:1 + fraction part.“1” is not stored; “hidden bit”.Corresponds to 7 decimal digits. • Exponent:127 added to the exponent.Corresponds to the range 10 -39 to 10 39 • Double Precision (64 bits) • Fraction part: 52 bits; 0x < 1 • Significand:1 + fraction part.“1” is not stored; “hidden bit”.Corresponds to 16 decimal digits. • Exponent:1023 added to the exponent; Corresponds to the range 10 -308 to 10 308
IEEE 754-1985 • Special features: • Correct rounding of “halfway” result (to even number). • Includes special values: • NaN Not a number • Infinity • - - Infinity • Uses denormal number to represent numbers less than 2 -E min • Rounds to nearest by default; Three other rounding modes exist. • Sophisticated exception handling.
Add / Sub (s1 * 2e1) +/- (s2 * 2 e2 ) = (s1 +/- s2) * 2 e3 = s3 * 2 e3 • s = 1.s, the hidden bit is used during the operation. 1: Shift summands so they have the same exponent: • e.g., if e2 < e1: shift s2 right and increment e2 until e1 = e2 2: Add/Sub significands using the sign bits for s1 and s2. • set sign bit accordingly for the result. 3: Normalize result (sign bit kept separate): • shift s3 left and decrement e3 until MSB = 1. 4: Round s3 correctly. • more than 23 / 52 bits is used internally for the addition.
Multiplication (s1 * 2e1) * (s2 * 2 e2 ) = s1 * s2 * 2 e1+e2 so, multiply significands and add exponents. Problem: Significand coded in sign & magnitude; use unsigned multiplication and take care of sign. Round 2n bits significand to n bits significand. Normalize result, compute new exponent with respect to bias.
Division (s1 * 2e1 ) / (s2 * 2 e2 ) = (s1 / s2) * 2 e1-e2 • so, divide significands and subtract exponents • Problem: • Significand coded in signed- magnitude - use unsigned division (different algoritms exists) and take care of sign • Round n + 2 (guard and round) bits significand to n bits significand • Compute new exponent with respect to bias