1 / 46

Understanding Floating Point Numbers in Information Technology

Explore the concept of floating point numbers and their calculations, normalization, and IEEE 754 Standard. Learn about Excess-N Notation, Overflow, and Underflow. Gain insights into programming considerations and practical implementations for real numbers.

cellen
Download Presentation

Understanding Floating Point Numbers in Information Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ITEC 1000 “Introduction to Information Technology” Lecture 5 Floating Point Numbers

  2. Lecture Template: • Floating Point Numbers • Exponential Notation • Excess-50 Notation • Overflow and Underflow • Floating Point Calculations • Normalization in Floating Point • IEEE 754 Standard • Packed Decimal Format • Programming Considerations

  3. Floating Point Numbers • Real numbers • Used in computer when the number • is outside the integer range of the computer (too large or too small) • contains a decimal fraction • the range in PC’s: • r • or more

  4. Exponential Notation • The following are equivalent representations of 1,234 123,400.0 x 10-2 12,340.0 x 10-1 1,234.0 x 100 123.4 x 101 12.34 x 102 1.234 x 103 0.1234 x 104 The representations differ in that the decimal place – the “point” -- “floats” to the left or right (with the appropriate adjustment in the exponent).

  5. Exponential Notation • Also called scientific notation • 4 specifications required for a number • Sign (“+” in example) • Magnitude or mantissa (12345) • Sign of the exponent (“+” in 105) • Magnitude of the exponent (5) • Plus • Base of the exponent (10) • Location of decimal point (or other base) radix point

  6. Exponent Sign ofexponent Mantissa Sign ofmantissa Location ofdecimal point Base Parts of a Floating Point Number -0.9876 x 10-3

  7. Floating Point Format Specification • Integer format (8-bit word) • 7 decimal digits and a sign • Range: -9,999,999 < I < +9,999,999 • Floating point format (8-bit word)

  8. Format • Mantissa: stored in sign-magnitude format • Assume decimal point located at the beginning of mantissa • Exponent stored in Excess-N notation: Complementary notation • Pick middle value as offset where N is the middle value: 0..99 e.g., excess-50

  9. Excess-50 notation • Excess-N representation: R = N + EE • Example1: N = 50, EE = 38, R = 88 • Example2: N = 50, EE = -38, R = 12 • Excess-50: Magnitude range

  10. Overflow and Underflow • Possible for the number to be too large or too small for representation 0.00001 x 10-50 = 10-55

  11. Floating Point Format: Excess-50 • First digit represents the sign of mantissa • 0 is used as a “+“sign • 5 is used as a “-“sign (arbitrarily) • Two next digits represent exponent in excess-50 • Five last digits represent mantissa • fixed decimal point located at the beginning

  12. Examples

  13. Normalization • Shift numbers left by increasing the exponent until leading zeros eliminated • Converting decimal number into standard format • Provide number with exponent (0 if not yet specified) • Increase/decrease exponent to shift decimal point to proper position • Decrease exponent to eliminate leading zeros on mantissa • Correct precision by adding 0’s or discarding/rounding least significant digits

  14. Example 1: 246.8035 Sign Excess-50 exponent Mantissa

  15. Example 2: 1255 x 10-3

  16. Example 3: - 0.00000075

  17. Floating Point Calculations • Addition and subtraction • Exponent and mantissa treated separately • Exponents of numbers must agree • Align decimal points • Least significant digits may be lost • Mantissa overflow requires exponent again shifted right

  18. Example Precision lost

  19. Multiplication and Division • Mantissas: multiplied or divided • Exponents: added or subtracted • Normalization necessary to • Restore location of decimal point • Maintain precision of the result • Adjust excess value since added twice • Example: 2 numbers with exponent = 53 represented in excess-50 notation • 53 + 53 =106 • Since 50 added twice, subtract: 106 – 50 =56 • Maintaining precision: • Normalizing and rounding multiplication

  20. Example

  21. Floating Point in the Computer • Replace digits with “0” and “1” bits • Typical floating point format • 32 bits provide range ~10-38 to 10+38 • 8-bit exponent = 256 levels • Excess-128 notation • 23 bits of mantissa: approximately 7 decimal digits of precision

  22. IEEE 754 Standard • Most common standard for representing floating point numbers • Single precision: 32 bits, consisting of... • Sign bit (1 bit) • Exponent (8 bits) • Mantissa (23 bits) • Double precision: 64 bits, consisting of… • Sign bit (1 bit) • Exponent (11 bits) • Mantissa (52 bits)

  23. Mantissa (23 bits) Exponent (8 bits) Sign of mantissa (1 bit) Single Precision Format 32 bits

  24. Mantissa (52 bits) Exponent (11 bits) Sign of mantissa (1 bit) Double Precision Format 64 bits

  25. IEEE 754 Standard

  26. IEEE 754 Standard • 32-bit Floating Point Value Definition

  27. Normalization in Floating Point • Mantissa: • Must always start with “1” • Leading bit is not stored • Implied that it is located to the left of the binary point • Normalized Form: 1.MMMMMMM… • E.g.: • Mantissa: • Actual value: • Exponent • Formatted using Excess-127 notation • Base 2 is implied • Range: 2-126 to 2127 10100000000000000000000 1.1012 = 1.62510

  28. Excess Notation: Example Represent exponent of 1410 in excess-127 form: 12710 = + 011111112 1410 = + 000011102 Representation = 100011012 14110

  29. Excess Notation: Example Represent exponent of -810 in excess 127 form: 12710 = + 011111112 - 810 = -000010002 Representation =011101112 11910

  30. 1.112 = 1.7510 130 – 127 = 3 0 = positive mantissa +1.75  23 = 14.0 or +1.112 23 = +1110.0 =14 Single Precision: Example 0 10000010 11000000000000000000000

  31. Single Precision: Exercise • What decimal value is represented by the following 32-bit floating point number? • Answer: 1 10000010 11110110000000000000000 Skip answer Answer

  32. Single Precision: Exercise Answer • What decimal value is represented by the following 32-bit floating point number? • Answer: -15.6875 1 10000010 11110110000000000000000

  33. Step by Step Solution 1 10000010 11110110000000000000000 To decimal form 130 - 127 = 3 1.11110110000000000000000000 1 + .5 + .25 + .125 + .0625 + 0 + .015625 + .0078125 1.9609375 23 = 15.6875 * - 15.6875 ( negative )

  34. Step by Step Solution : Alternative Method 1 10000010 11110110000000000000000 To decimal form 130 - 127 = 3 1.11110110000000000000000000 Shift “Point” 1111.10110000000000000000000 - 15.6875 ( negative )

  35. IBM floating point formats

  36. Alpha floating point formats

  37. Exercise: Floating Point Conversion • Express 3.14 as a 32-bit floating point number • Answer: • (Note: only use 10 significant bits for the mantissa) Skip answer Answer

  38. Exercise: Floating Point Conversion Answer • Express 3.14 as a 32-bit floating point number • Answer: • (Note: only use 10 significant bits for the mantissa) 0 10000000 10010001111000000000000

  39. Detail Solution : 3.14 to IEEE double precision 3.14 To Binary (approx): 11.001000111101 Delete implied left-most “1” and normalize 1001000111101 Prove ! Exponent = 127 + 1 position point moved when normalized 10000000 Value is positive: Sign bit = 0 0 10000000 10010001111010000000000

  40. Packed Decimal Format • Limited use: e.g: where precision particularly important, as in accounting and business functions. • Similar to BCD: e.g: four bit representation, as in BCD. • -> Stores two digits per byte. • Supported by business-oriented languages like COBOL • Implemented in IBM System 370/390 and Compaq Alpha

  41. Packed Decimal Format • Each decimal digit is stored in BCD • Two digits in a byte • The most significant digit – stored first, in the high-order bits of the first byte • Can store up to 31 digits in 16 bytes • The sign is stored in the low-order bits of the last byte • Binary 1100 represents “+” • Binary 1101 represents “-” • Binary 1111 represents unsigned number • Decimal point not stored: must be maintained by application software

  42. Packed Decimal Format: Example 1 Decimal Value: 1 0 3 5 7, unsigned Packed Decimal: 0001 0000 0011 0101 0111 1111 Byte 1 Byte 2 Byte 3

  43. Packed Decimal Format: Example 2 Decimal Value: - 9 0 4 1 3 Packed Decimal: 1001 0000 0100 0001 0011 1101 Byte 1 Byte 2 Byte3

  44. Integer vs. Floating Point: Programming Considerations • Integer advantages • Easier for computer to perform • Potential for higher precision • Faster to execute • Fewer storage locations to save time and space • Most high-level languages provide 2 or more different integer word sizes/formats: • Short integer (16 bits) • Long integer (64 bits)

  45. Integer vs. Floating Point: Programming Considerations • Real numbers, if: • Variable or constant has fractional part • Numbers take on very large or very small values outside integer range • Program should use least precision sufficient for the task • Higher precision formats require more storage • Packed decimal attractive alternative for business applications

  46. Computer humour 

More Related