1 / 20

COMS 161 Introduction to Computing

COMS 161 Introduction to Computing. Title: Numeric Processing Date: November 08, 2004 Lecture Number: 30. Announcements. Review. Real numbers Representation Limitations. Outline. Real numbers Representation Limitations. IEEE Standard 754. Provides two floating point types Single

Download Presentation

COMS 161 Introduction to Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMS 161Introduction to Computing Title: Numeric Processing Date: November 08, 2004 Lecture Number: 30

  2. Announcements

  3. Review • Real numbers • Representation • Limitations

  4. Outline • Real numbers • Representation • Limitations

  5. IEEE Standard 754 • Provides two floating point types • Single • 24-bits of significand precision • Double • 53-bits of significand precision

  6. s exponent significand 30 23 22 31 0 Single Precision • IEEE standard 754 • Floating point number representation • 32-bit s eeeeeeee fffffff ffffffffffffffff • s: (1) sign bit • 0 means positive, 1 means negative

  7. Single Precision s eeeeeeee fffffff ffffffffffffffff • e: (8) exponent bits [-126 … 127] • A bias of 127 is added to the exponent • f: (24) fractional part [23 bits + 1 implied bit] • Normalize the fractional part • 1 will always be on the left side of the binary point

  8. Special Single Cases • Two zeros • Signed zero • e = 0, f = 0 (exponent and fractional bits are all 0) • (-1)s x 0.0 • 0000 0000 0000 0000 0000 0000 0000 0000 • 0x0000 0000 (+0) • 1000 0000 0000 0000 0000 0000 0000 0000 • 0x8000 0000 (-0)

  9. Special Single Cases • Positive infinity • +INF • s = 0, e = 255, f = 0 (all fractional bits are all 0) • 0111 1111 1000 0000 0000 0000 0000 0000 • 0x7f80 0000 • Negative infinity • -INF • s = 1, e = 255, f = 0 (all fractional bits are all 0) • 1111 1111 1000 0000 0000 0000 0000 0000 • 0xff80 0000

  10. Special Single Cases • Not-A-Number (NaN) • s = 0 | 1, e = 255, f != 0 (at least one fractional bit is NOT 0) • There are many representations for NaN • Here is one example • 0111 1111 1100 0000 0000 0000 0000 0000 • 0x7fc0 0000

  11. Special Single Cases • Maximum single number • 0111 1111 0111 1111 1111 1111 1111 1111 • 0x7f7f ffff • 3.40282347 x 1038 • Minimum positive single number • 0000 0000 1000 0000 0000 0000 0000 0000 • 0x00800000 • 1.17549435 x 10-38 • To represent larger numbers

  12. Double Precision • IEEE standard 754 • Floating point number representation • 64-bit s eeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffff • s: (1) sign bit • 0 means positive, 1 means negative s exponent significand 62 52 51 63 32 significand 31 0

  13. Single Precision s eeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffff • e: (11) exponent bits [-1022 … 1023] • A bias of 1023 is added to the exponent • f: (53) fractional part [52 bits + 1 implied bit] • Normalize the fractional part • 1 will always be on the left side of the binary point

  14. Byte 0 1 2 3 seeeeeee eee f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f Byte 4 5 6 7 Real (Decimal) Number Storage • Double precision floating point numbers • s: (1) sign bit • e: (11) exponent bits [-1022 … 1023] • f: (53) fractional part [52 bits + 1 implied bit]

  15. Special Double Cases • Two zeros • Signed zero • e = 0, f = 0 (exponent and fractional bits are all 0) • (-1)s x 0.0 • 64 bits • 0000 0000 0000 0000 0000 0000 0000 … 0000 • 0x0000 0000 0000 0000 (+0) • 1000 0000 0000 0000 0000 0000 0000 … 0000 • 0x8000 0000 0000 0000 (-0)

  16. Special Double Cases • Positive infinity • +INF • s = 0, e = 2047, f = 0 (all fractional bits are all 0) • 0111 1111 1111 0000 0000 0000 0000 … 0000 • 0x7ff0 0000 0000 0000 • Negative infinity • -INF • s = 1, e = 2047, f = 0 (all fractional bits are all 0) • 1111 1111 1111 0000 0000 0000 0000 … 0000 • 0xfff0 0000 0000 0000

  17. Special Double Cases • Not-A-Number (NaN) • s = 0 | 1, e = 2047, f != 0 (at least one fractional bit is NOT 0) • There are many representations for NaN • Here is one example • 0111 1111 1111 1000 0000 0000 0000 … 0000 • 0x7ff8 0000 0000 0000

  18. Special Double Cases • Maximum double number • 0111 1111 1110 1111 1111 1111 1111 … 1111 • 0x7fef ffff ffff ffff • 1.7976931348623157 x 10308 • Minimum positive single number • 0000 0000 0001 0000 0000 0000 0000 … 0000 • 0x0010 0000 0000 0000 • 2.2250738585072014 x 10-308 • Don’t forget about the implied 1 bit!!

  19. Decimal to Float Conversion • Show –24.12510 in IEEE single precision format • First, save sign (negative so 1) and convert to binary… • 24.12510 = 11000.0012 x 20 • Normalize… • = 1.10000012 x 24 • Strip 1 off the mantissa and extend to form significand • = .10000010000000000000000 • Bias the exponent… • Exp + Bias = 4 + 127 = 131 = 100000112

  20. Real (Decimal) Number Storage • 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 • 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 • Hex value : 0xC1C10000 • Link me baby

More Related