320 likes | 328 Views
Learn about the representation and limitations of real numbers in floating point format, using IEEE Standard 754. Explore single and double precision, special cases, and exception handling.
E N D
COMS 161Introduction to Computing Title: Numeric Processing Date: November 05, 2004 Lecture Number: 29
Announcements • Homework 7 • Due November 8, 2004 • Research paper proposal 2 due • November 8, 2004
Review • Integers • Big-endian • Little-endian • Overflow
Outline • Real numbers • Representation • Limitations
Real (Decimal) Number Storage • Real numbers are stored in floating point representation • IEEE Standard 754 • Allows using data on different machines • A sign • An exponent • A mantissa also called a significand (normalized decimal fraction) • Single digit to the left of the decimal point
IEEE Standard 754 • Provides two floating point types • Single • 24-bits of significand precision • Double • 53-bits of significand precision • Five exceptions • Invalid operation • Division by zero • Overflow • Underflow • Inexact
IEEE Standard 754 • Four rounding directions • Toward the nearest representable value • "even" values preferred whenever there are two nearest representable values • Toward negative infinity (down) • Toward positive infinity (up) • Toward 0 (chop)
s exponent significand 30 23 22 31 0 Single Precision • IEEE standard 754 • Floating point number representation • 32-bit s eeeeeeee fffffff ffffffffffffffff • s: (1) sign bit • 0 means positive, 1 means negative
Single Precision s eeeeeeee fffffff ffffffffffffffff • e: (8) exponent bits [-126 … 127] • A bias of 127 is added to the exponent • Exponent of 0 is stored as 127, stored exponent of 200 means actual exponent is (200 – 127) = 73 • Stored exponent of all zeros and ones are reserved for special numbers • f: (24) fractional part [23 bits + 1 implied bit] • Since number to the left of the decimal point is not zero, its binary representation will have a leading one • Saves a bit, a one is implied and does not need to be explicitly stored
Special Single Cases • Two zeros • Signed zero • e = 0, f = 0 (exponent and fractional bits are all 0) • (-1)s x 0.0 • 0000 0000 0000 0000 0000 0000 0000 0000 • 0x0000 0000 (+0) • 1000 0000 0000 0000 0000 0000 0000 0000 • 0x8000 0000 (-0)
Special Single Cases • Positive infinity • +INF • s = 0, e = 255, f = 0 (all fractional bits are all 0) • 0111 1111 1000 0000 0000 0000 0000 0000 • 0x7f80 0000 • Negative infinity • -INF • s = 1, e = 255, f = 0 (all fractional bits are all 0) • 1111 1111 1000 0000 0000 0000 0000 0000 • 0xff80 0000
Special Single Cases • Not-A-Number (NaN) • s = 0 | 1, e = 255, f != 0 (at least one fractional bit is NOT 0) • There are many representations for NaN • Here is one example • 0111 1111 1100 0000 0000 0000 0000 0000 • 0x7fc0 0000
Special Single Cases • Maximum single number • 0111 1111 0111 1111 1111 1111 1111 1111 • 0x7f7f ffff • 3.40282347 x 1038 • Minimum positive single number • 0000 0000 1000 0000 0000 0000 0000 0000 • 0x00800000 • 1.17549435 x 10-38 • To represent larger numbers
Double Precision • IEEE standard 754 • Floating point number representation • 64-bit s eeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffff • s: (1) sign bit • 0 means positive, 1 means negative s exponent significand 62 52 51 63 32 significand 31 0
Single Precision s eeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffff • e: (11) exponent bits [-1022 … 1023] • A bias of 1023 is added to the exponent • Exponent of 0 is stored as 1023, stored exponent of 2000 means actual exponent is (2000 – 1023) = 977 • Stored exponent of all zeros and ones are reserved for special numbers • f: (53) fractional part [52 bits + 1 implied bit] • Since number to the left of the decimal point is not zero, its binary representation will have a leading one • Saves a bit, a one is implied and does not need to be explicitly stored
Byte 0 1 2 3 seeeeeee eee f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f Byte 4 5 6 7 Real (Decimal) Number Storage • Double precision floating point numbers • s: (1) sign bit • e: (11) exponent bits [-1022 … 1023] • f: (53) fractional part [52 bits + 1 implied bit]
Special Double Cases • Two zeros • Signed zero • e = 0, f = 0 (exponent and fractional bits are all 0) • (-1)s x 0.0 • 64 bits • 0000 0000 0000 0000 0000 0000 0000 … 0000 • 0x0000 0000 0000 0000 (+0) • 1000 0000 0000 0000 0000 0000 0000 … 0000 • 0x8000 0000 0000 0000 (-0)
Special Double Cases • Positive infinity • +INF • s = 0, e = 2047, f = 0 (all fractional bits are all 0) • 0111 1111 1111 0000 0000 0000 0000 … 0000 • 0x7ff0 0000 0000 0000 • Negative infinity • -INF • s = 1, e = 2047, f = 0 (all fractional bits are all 0) • 1111 1111 1111 0000 0000 0000 0000 … 0000 • 0xfff0 0000 0000 0000
Special Double Cases • Not-A-Number (NaN) • s = 0 | 1, e = 2047, f != 0 (at least one fractional bit is NOT 0) • There are many representations for NaN • Here is one example • 0111 1111 1111 1000 0000 0000 0000 … 0000 • 0x7ff8 0000 0000 0000
Special Double Cases • Maximum double number • 0111 1111 1110 1111 1111 1111 1111 … 1111 • 0x7fef ffff ffff ffff • 1.7976931348623157 x 10308 • Minimum positive single number • 0000 0000 0001 0000 0000 0000 0000 … 0000 • 0x0010 0000 0000 0000 • 2.2250738585072014 x 10-308
Decimal to Float Conversion • Show –24.12510 in IEEE single precision format • First, save sign (negative so 1) and convert to binary… • 24.12510 = 11000.0012 x 20 • Normalize… • = 1.10000012 x 24 • Strip 1 off the mantissa and extend to form significand • = .10000010000000000000000 • Bias the exponent… • Exp + Bias = 4 + 127 = 131 = 100000112
Real (Decimal) Number Storage • 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 • 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 • Hex value : 0xC1C10000 • Link me baby
Real (Decimal) Number Storage • Numbers have limited precision Compute 1
Real (Decimal) Number Storage #include <iostream.h> void main() { cout << "precision example" << endl; cout << "Number of bytes in a float: " << sizeof(float) << endl; float epsilon = 1.0f, value; int iteration = 0; int maxIteration = 100; while(iteration < maxIteration) { epsilon /= 2.0; value = 1.0f + epsilon; if (value == 1) break; iteration++; } // end while(...) cout << "Iteration: " << iteration << " Epsilon: " << epsilon << " Value: " << value << endl << endl; iteration = 0; double epsilonD = 1.0, valueD; cout << "Number of bytes in a double: " << sizeof(double) << endl; while(iteration < maxIteration) { epsilonD /= 2.0; valueD = 1.0 + epsilonD; if (valueD == 1) break; iteration++; } // end while(...) cout << "Iteration: " << iteration << " Epsilon: " << epsilonD << " Value: " << valueD << endl; }
Real (Decimal) Number Storage • Numbers have limited precision • Most real numbers have an infinite decimal expansion
Real Number StorageLimited Range and Precision • There are three categories of numbers left out when floating point representation is used • Numbers out of range because their absolute value is too large (similar to integer overflow) • Numbers out of range because their absolute value is too small (numbers too near zero to be stored given the precision available • Numbers whose binary representations require either an infinite number of binary digits or more binary digits than the bits available
Real Number StorageLimited Range and Precision Illustrated With one bit to the right of the decimal point, only the real number 0.5 can be represented.
Real Number StorageLimited Range and Precision Illustrated real numbers that can be represented with two bits 0.25, 0.5, 0.75 real numbers that can be represented with three bits 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875 The holes correspond to all the unrepresented numbers: 0.126, 0.255, 0.3, …
Limited Range and PrecisionSome Consequences • Limited range will invalidate certain calculations • If integers are involved, this can often be avoided by switching to real numbers • For real number calculations, this problem arises infrequently and in those cases can sometimes be handled by special methods • It is not a common occurrence in non-scientific work
Limited Range and PrecisionSome Consequences • Limited precision for real numbers is very pervasive • Assume that most decimal calculations will, in fact, be in error! • Evaluate and use computer calculations with this in mind
Social ThemesRisks in Numerical Computing • Almost all computer calculations involve roundoff error (limited precision error) • If not monitored and planned for carefully, such errors can lead to unexpected and catastrophic results • Arianne 5 Rocket Failure • Patriot Missile Failure during Gulf War
Software for Numerical Work • Software Libraries • Spreadsheets • Mathematical Software • symbolic manipulation • data analysis • data visualization