1 / 19

Understanding IEEE 754 Floating Point Arithmetic

Learn about IEEE 754 standard for floating point arithmetic, including single and double precision, addition rules, normalization, and examples. Explore accurate arithmetic with guard, round, and sticky bits.

andreav
Download Presentation

Understanding IEEE 754 Floating Point Arithmetic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCI206 - Computer Organization & Programming Floating Point Arithmetic Revised by Alexander Fuchsberger and Xiannong Meng in spring 2019 based on the notes by other instructors. zyBook: 10.9, 10.10

  2. Review IEEE754 • Special values, else normalized numbers

  3. IEEE 754 Standard (1985) • One bit for Sign • Single precision float (32 bits) • 8 bit Exponent • 23 bit Mantissa • Double precision float (64 bits) • 11 bit Exponent • 52 bit Mantissa

  4. IEEE 754 Standard (1985) • Mantissa is normalized, meaning it is a fixed point number in the form 1.xxxxxx • to save one bit, the 1. is implicit (not represented) • Exponent is represented in biased form • B = 127 for single • B = 1023 for double

  5. Floating Point Addition • Match exponents • Add the significants • Normalize the sum • Check overflow/underflow • Round • Done

  6. FP Addition Example Show the FP addition of: 0.5 + 0.4375 using 4 bits of precision

  7. FP Addition Example Show the FP addition of: 0.5 + 0.4375

  8. 0. IEEE 754 Representation Single precision: B = 127 Sign = 0, Exponent = -1 + 127 = 126 = 0111 1110, Mantissa = 0000 0000 0000 0000 0000 000 Or 0x3F000000 Sign = 0, Exponent = -2 + 127 = 125 = 0111 1101, Mantissa = 1100 0000 0000 0000 0000 000 Or 0x3EE00000

  9. shift number right, increment exponent shift number left, decrement exponent 1. Match Exponents larger exponent is , rewrite

  10. 2. Add

  11. 3. Renormalize (nothing to do in this example)

  12. 4. Check overflow/underflow (nothing to do in this example)

  13. 5. Round 6. Done (check) (nothing to do in this example) 0.5 + 0.4375 = 0.9375

  14. 6. IEEE 754 Representation Single precision: B = 127 Sign = 0, Exponent = -1 + 127 = 126 = 0111 1110, Mantissa = 1110 0000 0000 0000 0000 000 Or 0x3F700000 Do Activity 25

  15. Accurate Arithmetic • IEEE754 guarantees accuracy within 0.5 ulp (units in last place) • error is no more than ½ of least significant digit • To achieve this, 3 extra bits are added to the fractional part in arithmetic, they are called: guard, round, and sticky bits (from most significant to least).

  16. Guard, Round, Sticky • Guard and round bits act like any other bit to to the right of the decimal. • The sticky bit is set if any nonzero value to the right of this place has ever existed. • once set, it stays set (it’s sticky)

  17. Addition Example with GRS • Use 4-bits of precision plus GRS 5. Round GRS: 111 rounds up: 1. Normalize: G GRS Nonzero value to right of round bit, so SET sticky bit 2. Add: GRS

  18. Multiplication Similar to addition • Add exponents • Multiply significands • Normalize • Check/round • Set sign

More Related