460 likes | 512 Views
Explore the concept of floating point numbers and their calculations, normalization, and IEEE 754 Standard. Learn about Excess-N Notation, Overflow, and Underflow. Gain insights into programming considerations and practical implementations for real numbers.
E N D
ITEC 1000 “Introduction to Information Technology” Lecture 5 Floating Point Numbers
Lecture Template: • Floating Point Numbers • Exponential Notation • Excess-50 Notation • Overflow and Underflow • Floating Point Calculations • Normalization in Floating Point • IEEE 754 Standard • Packed Decimal Format • Programming Considerations
Floating Point Numbers • Real numbers • Used in computer when the number • is outside the integer range of the computer (too large or too small) • contains a decimal fraction • the range in PC’s: • r • or more
Exponential Notation • The following are equivalent representations of 1,234 123,400.0 x 10-2 12,340.0 x 10-1 1,234.0 x 100 123.4 x 101 12.34 x 102 1.234 x 103 0.1234 x 104 The representations differ in that the decimal place – the “point” -- “floats” to the left or right (with the appropriate adjustment in the exponent).
Exponential Notation • Also called scientific notation • 4 specifications required for a number • Sign (“+” in example) • Magnitude or mantissa (12345) • Sign of the exponent (“+” in 105) • Magnitude of the exponent (5) • Plus • Base of the exponent (10) • Location of decimal point (or other base) radix point
Exponent Sign ofexponent Mantissa Sign ofmantissa Location ofdecimal point Base Parts of a Floating Point Number -0.9876 x 10-3
Floating Point Format Specification • Integer format (8-bit word) • 7 decimal digits and a sign • Range: -9,999,999 < I < +9,999,999 • Floating point format (8-bit word)
Format • Mantissa: stored in sign-magnitude format • Assume decimal point located at the beginning of mantissa • Exponent stored in Excess-N notation: Complementary notation • Pick middle value as offset where N is the middle value: 0..99 e.g., excess-50
Excess-50 notation • Excess-N representation: R = N + EE • Example1: N = 50, EE = 38, R = 88 • Example2: N = 50, EE = -38, R = 12 • Excess-50: Magnitude range
Overflow and Underflow • Possible for the number to be too large or too small for representation 0.00001 x 10-50 = 10-55
Floating Point Format: Excess-50 • First digit represents the sign of mantissa • 0 is used as a “+“sign • 5 is used as a “-“sign (arbitrarily) • Two next digits represent exponent in excess-50 • Five last digits represent mantissa • fixed decimal point located at the beginning
Normalization • Shift numbers left by increasing the exponent until leading zeros eliminated • Converting decimal number into standard format • Provide number with exponent (0 if not yet specified) • Increase/decrease exponent to shift decimal point to proper position • Decrease exponent to eliminate leading zeros on mantissa • Correct precision by adding 0’s or discarding/rounding least significant digits
Example 1: 246.8035 Sign Excess-50 exponent Mantissa
Floating Point Calculations • Addition and subtraction • Exponent and mantissa treated separately • Exponents of numbers must agree • Align decimal points • Least significant digits may be lost • Mantissa overflow requires exponent again shifted right
Example Precision lost
Multiplication and Division • Mantissas: multiplied or divided • Exponents: added or subtracted • Normalization necessary to • Restore location of decimal point • Maintain precision of the result • Adjust excess value since added twice • Example: 2 numbers with exponent = 53 represented in excess-50 notation • 53 + 53 =106 • Since 50 added twice, subtract: 106 – 50 =56 • Maintaining precision: • Normalizing and rounding multiplication
Floating Point in the Computer • Replace digits with “0” and “1” bits • Typical floating point format • 32 bits provide range ~10-38 to 10+38 • 8-bit exponent = 256 levels • Excess-128 notation • 23 bits of mantissa: approximately 7 decimal digits of precision
IEEE 754 Standard • Most common standard for representing floating point numbers • Single precision: 32 bits, consisting of... • Sign bit (1 bit) • Exponent (8 bits) • Mantissa (23 bits) • Double precision: 64 bits, consisting of… • Sign bit (1 bit) • Exponent (11 bits) • Mantissa (52 bits)
Mantissa (23 bits) Exponent (8 bits) Sign of mantissa (1 bit) Single Precision Format 32 bits
Mantissa (52 bits) Exponent (11 bits) Sign of mantissa (1 bit) Double Precision Format 64 bits
IEEE 754 Standard • 32-bit Floating Point Value Definition
Normalization in Floating Point • Mantissa: • Must always start with “1” • Leading bit is not stored • Implied that it is located to the left of the binary point • Normalized Form: 1.MMMMMMM… • E.g.: • Mantissa: • Actual value: • Exponent • Formatted using Excess-127 notation • Base 2 is implied • Range: 2-126 to 2127 10100000000000000000000 1.1012 = 1.62510
Excess Notation: Example Represent exponent of 1410 in excess-127 form: 12710 = + 011111112 1410 = + 000011102 Representation = 100011012 14110
Excess Notation: Example Represent exponent of -810 in excess 127 form: 12710 = + 011111112 - 810 = -000010002 Representation =011101112 11910
1.112 = 1.7510 130 – 127 = 3 0 = positive mantissa +1.75 23 = 14.0 or +1.112 23 = +1110.0 =14 Single Precision: Example 0 10000010 11000000000000000000000
Single Precision: Exercise • What decimal value is represented by the following 32-bit floating point number? • Answer: 1 10000010 11110110000000000000000 Skip answer Answer
Single Precision: Exercise Answer • What decimal value is represented by the following 32-bit floating point number? • Answer: -15.6875 1 10000010 11110110000000000000000
Step by Step Solution 1 10000010 11110110000000000000000 To decimal form 130 - 127 = 3 1.11110110000000000000000000 1 + .5 + .25 + .125 + .0625 + 0 + .015625 + .0078125 1.9609375 23 = 15.6875 * - 15.6875 ( negative )
Step by Step Solution : Alternative Method 1 10000010 11110110000000000000000 To decimal form 130 - 127 = 3 1.11110110000000000000000000 Shift “Point” 1111.10110000000000000000000 - 15.6875 ( negative )
Exercise: Floating Point Conversion • Express 3.14 as a 32-bit floating point number • Answer: • (Note: only use 10 significant bits for the mantissa) Skip answer Answer
Exercise: Floating Point Conversion Answer • Express 3.14 as a 32-bit floating point number • Answer: • (Note: only use 10 significant bits for the mantissa) 0 10000000 10010001111000000000000
Detail Solution : 3.14 to IEEE double precision 3.14 To Binary (approx): 11.001000111101 Delete implied left-most “1” and normalize 1001000111101 Prove ! Exponent = 127 + 1 position point moved when normalized 10000000 Value is positive: Sign bit = 0 0 10000000 10010001111010000000000
Packed Decimal Format • Limited use: e.g: where precision particularly important, as in accounting and business functions. • Similar to BCD: e.g: four bit representation, as in BCD. • -> Stores two digits per byte. • Supported by business-oriented languages like COBOL • Implemented in IBM System 370/390 and Compaq Alpha
Packed Decimal Format • Each decimal digit is stored in BCD • Two digits in a byte • The most significant digit – stored first, in the high-order bits of the first byte • Can store up to 31 digits in 16 bytes • The sign is stored in the low-order bits of the last byte • Binary 1100 represents “+” • Binary 1101 represents “-” • Binary 1111 represents unsigned number • Decimal point not stored: must be maintained by application software
Packed Decimal Format: Example 1 Decimal Value: 1 0 3 5 7, unsigned Packed Decimal: 0001 0000 0011 0101 0111 1111 Byte 1 Byte 2 Byte 3
Packed Decimal Format: Example 2 Decimal Value: - 9 0 4 1 3 Packed Decimal: 1001 0000 0100 0001 0011 1101 Byte 1 Byte 2 Byte3
Integer vs. Floating Point: Programming Considerations • Integer advantages • Easier for computer to perform • Potential for higher precision • Faster to execute • Fewer storage locations to save time and space • Most high-level languages provide 2 or more different integer word sizes/formats: • Short integer (16 bits) • Long integer (64 bits)
Integer vs. Floating Point: Programming Considerations • Real numbers, if: • Variable or constant has fractional part • Numbers take on very large or very small values outside integer range • Program should use least precision sufficient for the task • Higher precision formats require more storage • Packed decimal attractive alternative for business applications