200 likes | 218 Views
Learn about IEEE 754 standard for floating point arithmetic, including single and double precision, addition rules, normalization, and examples. Explore accurate arithmetic with guard, round, and sticky bits.
E N D
CSCI206 - Computer Organization & Programming Floating Point Arithmetic Revised by Alexander Fuchsberger and Xiannong Meng in spring 2019 based on the notes by other instructors. zyBook: 10.9, 10.10
Review IEEE754 • Special values, else normalized numbers
IEEE 754 Standard (1985) • One bit for Sign • Single precision float (32 bits) • 8 bit Exponent • 23 bit Mantissa • Double precision float (64 bits) • 11 bit Exponent • 52 bit Mantissa
IEEE 754 Standard (1985) • Mantissa is normalized, meaning it is a fixed point number in the form 1.xxxxxx • to save one bit, the 1. is implicit (not represented) • Exponent is represented in biased form • B = 127 for single • B = 1023 for double
Floating Point Addition • Match exponents • Add the significants • Normalize the sum • Check overflow/underflow • Round • Done
FP Addition Example Show the FP addition of: 0.5 + 0.4375 using 4 bits of precision
FP Addition Example Show the FP addition of: 0.5 + 0.4375
0. IEEE 754 Representation Single precision: B = 127 Sign = 0, Exponent = -1 + 127 = 126 = 0111 1110, Mantissa = 0000 0000 0000 0000 0000 000 Or 0x3F000000 Sign = 0, Exponent = -2 + 127 = 125 = 0111 1101, Mantissa = 1100 0000 0000 0000 0000 000 Or 0x3EE00000
shift number right, increment exponent shift number left, decrement exponent 1. Match Exponents larger exponent is , rewrite
3. Renormalize (nothing to do in this example)
4. Check overflow/underflow (nothing to do in this example)
5. Round 6. Done (check) (nothing to do in this example) 0.5 + 0.4375 = 0.9375
6. IEEE 754 Representation Single precision: B = 127 Sign = 0, Exponent = -1 + 127 = 126 = 0111 1110, Mantissa = 1110 0000 0000 0000 0000 000 Or 0x3F700000 Do Activity 25
Accurate Arithmetic • IEEE754 guarantees accuracy within 0.5 ulp (units in last place) • error is no more than ½ of least significant digit • To achieve this, 3 extra bits are added to the fractional part in arithmetic, they are called: guard, round, and sticky bits (from most significant to least).
Guard, Round, Sticky • Guard and round bits act like any other bit to to the right of the decimal. • The sticky bit is set if any nonzero value to the right of this place has ever existed. • once set, it stays set (it’s sticky)
Addition Example with GRS • Use 4-bits of precision plus GRS 5. Round GRS: 111 rounds up: 1. Normalize: G GRS Nonzero value to right of round bit, so SET sticky bit 2. Add: GRS
Multiplication Similar to addition • Add exponents • Multiply significands • Normalize • Check/round • Set sign