210 likes | 243 Views
COE 308. Floating Point. The World is not just Integers. 3.14159265… ten ( π ). 2.71828… ten ( e ). 0.0000000001 ten or 1.0 ten x 10 -9. 3,155,760,000 ten or 3.15576 ten x 10 9. Scientific Notation : A.AAAAA x 10 yyyy. Incorrect (un-normalized) notation.
E N D
COE 308 Floating Point COE 308
The World is not just Integers 3.14159265…ten (π) 2.71828…ten (e) 0.0000000001ten or 1.0ten x 10-9 3,155,760,000ten or 3.15576ten x 109 Scientific Notation: A.AAAAA x 10yyyy Incorrect (un-normalized) notation Correct Normalized Notation 1.0ten x 10-9 0.1ten x 10-8 3.15576ten x 109 31.5576ten x 108 COE 308
Floating Point Scientific Notation in binary: 1.XXXXXXXXX . 2yyyyyyy • Representation: • Sign: S • Exponent: E • Significand: F 1.0 > F ≥ 0 (-1)S x (1.0 + F) x 2E COE 308
Floating Point Representation GOAL Quickly compare two FP numbers By considering them unsigned integers • Options: • Significand: • Sign + Magnitude • Two’s Complement • Exponent: • Sign + Magnitude • Exponent • Biased Significand:F Sign Exponent Significand:F Exponent Sign + Magnitude Two’s Complement COE 308
Floating Point Representation- Two’s Complement - Example: Consider the following two numbers: A = 1.32 x 217 B = - 1.22 x 217 A = 10101000111101011100001 x 217 B = - 00111000011010001111010 x 217 B = 11000111100101110000110 x 217 in 2’s complement representation 0 00010001 10101000111101011100001 A 0 00010001 11000111100101110000110 B Although in fact A > B because A>0 and B<0, The two numbers as represented above give the impression that B > A if we consider them as two 32-bits unsigned integers. Two’s Complement Representation Unsuitable for Quickly comparing two FP numbers COE 308
Floating Point Representation- Sign + Magnitude - A > B because of the sign bit. In Sign + Magnitude representation, the two numbers A = 1.32 x 217 and B = - 1.22 x 217 will be represented as follows: 0 00010001 10101000111101011100001 A 1 00010001 00111000011010001111010 B Sign + Magnitude Representation suitable for Quickly comparing two FP numbers COE 308
How about the Exponent- Two’s Complement - In previous examples, A and B exponents were positive. How about if one of the exponents is negative? Example: A = 1.32 x 2-17 and B = 1.22 x 217 Let’s represent the exponent in two’s complement representation 0 11101111 10101000111101011100001 A 0 00010001 00111000011010001111010 B Although in fact A < B, the two numbers as represented above give the impression that A > B Exponent Two’s Complement Representation Unsuitable for Quickly comparing two FP numbers COE 308
How about the Exponent- Biased Representation - A > B because of the biased representation of exponents In a biased representation, we add an offset to the exponent so that: The lowest negative exponent is represented with the value ONE (00000001) If a number K = k1. 2e E = e + bias For Single Precision, bias = 127 and for Double Precision, bias = 1023 0 01101110 10101000111101011100001 A E = -17 + 127 = 110 0 10010000 00111000011010001111010 B E = 17 + 127 = 144 Biased Representation Suitable COE 308
Floating Point Representation- Summary - Sign:S Exponent:E Significand:F • Sign + Magnitude better than 2’s complement • Exponent represented in a bias notation (-1)S x (1.0 + F) x 2E-bias IEEE-754 Standard COE 308
MIPS Floating Point Formats Follows IEEE-754 floating point standard Single Precision S:1 bit E: 8 bits F: 23 bits Double Precision S:1 bit E: 11 bits F: 20 bits F: 32 bits COE 308
Conversion How to convert from decimal to FP ? COE 308
Representation of Zero • Zero is represented with: • Sign bit at 0 • Exponent field composed of all bits at 0 • Significand bits are too all at 0 0 00000000 00000000000000000000000 COE 308
Two Issues with FP • Overflow occurs when the exponent of the result is larger than the available bits for the exponent field • Underflow occurs when the result is smaller than the smallest number that can be represented and will yield a significand of 0s. COE 308
Representation of Exceptions • Infinity • Represented as a number with an exponent of 255 (Single Precision) or 2047 (Double Precision) • The sign determines whether it is ± + 0 11111111 00000000000000000000000 - 1 11111111 00000000000000000000000 • NaN: Not a Number. Used to represent errors and exceptions • Represented with maximum E and F≠0 • Result of exception like division by 0 or square root of negative number • Operation on a NaN will result in a NaN. COE 308
Floating Point Addition Example: Add two numbers A = 1.85 x 1025 and B = 1.45 x 1017 How to proceed ? 1.85 x 1025 = 185000000.00 x 1017 1.45 x 1017 = 1.45 x 1017 ------------------------------------------------------ Need to align the two significands. Alignment of significands to have same exponents So that Addition becomes POSSIBLE • Alignment of significands in binary is performed by shifts • Smaller exponent number is shifted to the right COE 308
Addition or Subtraction ? • A and B may not be of the same sign • Need also want to simply provide subtraction • Need to Select between A and B to determine which one needs to be shifted right • Need to Select between A and B to know which one needs to be complemented (2’s complement) • Needs also to determine the sign of the result (whether to complement the result or not) COE 308
Floating Point Addition • Compare exponents • Shift smaller number to the right (increment its exponent) until its exponent matches the larger exponent • Complement one of the two operands (if needed) • Add the two significands • Loop to Normalize the result • Shift (left/right) to normalize the result • Detect overflow/underflow • Round the significand to the appropriate number of bits • Complement the result (if needed) COE 308
Floating Point Addition Circuit - Shift Exponent Compare Unit complement complement Sign Determination Unit + complement Normalization and Rounding COE 308
Floating Point Multiplication • Add the two biased exponents • Subtract one time a bias to get a biased exponent • Because E1 = e1 + bias and E2 = e2 + bias • E1 + E2 = e1 + e2 + 2xbias • Multiply the significands • Loop to Normalize the result • Shift (left/right) to normalize the result • Detect overflow/underflow • Round the significand to the appropriate number of bits • Set the sign of the product COE 308
Floating Point Multiplication Circuit + Exponent Addition Unit Sign Determination Unit Integer Multiplication Circuit Bias - Normalization and Rounding COE 308
Floating Point Instructions in MIPS • 32 Floating Point Registers $f0, …, $f31 • add.s, sub.s, mul.s and div.s: single precision • add.d, sub.d, mul.d and div.d: double precision • lwc1, swc1: load/store fp to/from memory • bc1t, bc1f: branch if FP cond true/false • c.lt.s, c.lt.d: compare single/double precision. COE 308