610 likes | 848 Views
Floating Point Computation. Jyun-Ming Chen. Contents. Sources of Computational Error Computer Representation of (floating-point) Numbers Efficiency Issues. Converting a mathematical problem to numerical problem, one introduces errors due to limited computation resources:
E N D
Floating Point Computation Jyun-Ming Chen Spring 2013
Contents • Sources of Computational Error • Computer Representation of (floating-point) Numbers • Efficiency Issues Spring 2013
Converting a mathematical problem to numerical problem, one introduces errors due to limited computation resources: round off error (limited precision of representation) truncation error (limited time for computation) Misc. Error in original data Blunder: to make a mistake through stupidity, ignorance, or carelessness; programming/data input error Propagated error Sources of Computational Error Spring 2013
Gross error: caused by human or mechanical mistakes Roundoff error: the consequence of using a number specified by n correct digits to approximate a number which requires more than n digits (generally infinitely many digits) for its exact specification. Truncation error: any error which is neither a gross error nor a roundoff error. Frequently, a truncation error corresponds to the fact that, whereas an exact result would be afforded (in the limit) by an infinite sequence of steps, the process is truncated after a certain finite number of steps. Supplement: Error Classification (Hildebrand) Spring 2013
Common Measures of Error • Definitions • total error = round off + truncation • Absolute error = | numerical – exact | • Relative error = Abs. error / | exact | • If exact is zero, rel. error is not defined Spring 2013
Representation consists of finite number of digits The approximation of real-number on the number line is discrete! R Ex: Round off error Spring 2013
Watch out for printf !! • By default, “%f” prints out 6 digits behind decimal point. Spring 2013
Ex: Numerical Differentiation • Evaluating first derivative of f(x) Truncation error Spring 2013
Select a problem with known answer So that we can evaluate the error! Numerical Differentiation (cont) Spring 2013
Error analysis h (truncation) error What happened at h = 0.00001?! Numerical Differentiation (cont) Spring 2013
Ex: Polynomial Deflation • F(x) is a polynomial with 20 real roots • Use any method to numerically solve a root, then deflate the polynomial to 19th degree • Solve another root, and deflate again, and again, … • The accuracy of the roots obtained is getting worse each time due to error propagation Spring 2013
Computer Representation of Floating Point Numbers Decimal-binary conversion Floating point VS. fixed point Standard: IEEE 754 (1985) Spring 2013
Ex: 29(base 10) 2)29 2)14 1 2) 7 0 2) 3 1 2) 1 1 2) 0 1 Decimal-Binary Conversion 2910=111012 Spring 2013
Fraction Binary Conversion • Ex: 0.625 (base 10) 2 a1=1 2 a2=1 a3=1 a4= a5=…=0 Spring 2013
Computing: How about 0.110? 0.625 2 2 2 1.250 0.500 1.000 0.110 = 0.000112 0.62510 = 0.1012 Spring 2013
Floating VS. Fixed Point • Decimal, 6 digits (positive number) • fixed point: with 5 digits after decimal point • 0.00001, … , 9.99999 • Floating point: 2 digits as exponent (10-base); 4 digits for mantissa (accuracy) • 0.001x1000, … , 9.999x1099 • Comparison: • Fixed point: fixed accuracy; simple math for computation (used in systems w/o FPU) • Floating point: trade accuracy for larger range of representation Spring 2013
Floating Point Representation • Fraction, f • Usually normalized so that • Base, b • 2 for personal computers • 16 for mainframe • … • Exponent, e Spring 2013
IEEE 754-1985 • Purpose: make floating system portable • Defines: the number representation, how calculation performed, exceptions, … • Single-precision (32-bit) • Double-precision (64-bit) Spring 2013
S: sign of mantissa Range (roughly) Single: 10-38 to 1038 Double: 10-307 to 10307 Precision (roughly) Single: 7-8 significant decimal digits Double: 15 significant decimal digits Number Representation Spring 2013
In binary sense, 24 bits are significant (with implicit one – next page) In decimal sense, roughly 7-8 decimal significant digits When you write your program, make sure the results you printed carry the meaningful significant digits. 2-23 1 Significant Digits Spring 2013
Implicit One • Normalized mantissa always 1.0 • Only store the fractional part to increase one extra bit of precision • Ex: 3.5 Spring 2013
Exponent Bias • Ex: in single precision, exponent has 8 bits • 0000 0000 (0) to 1111 1111 (255) • Add an offset to represent +/ – numbers • Effective exponent = biased exponent – bias • Bias value: 32-bit (127); 64-bit (1023) • Ex: 32-bit • 1000 0000 (128): effective exp.=128-127=1 Spring 2013
Ex: Convert – 3.5 to 32-bit FP Number Spring 2013
Explain how this program works Examine Bits of FP Numbers Spring 2013
The “Examiner” • Use the previous program to • Observe how ME work • Test subnormal behaviors on your computer/compiler • Convince yourself why the subtraction of two nearly equal numbers produce lots of error • NaN: Not-a-Number !? Spring 2013
Design Philosophy of IEEE 754 • [s|e|m] • S first: whether the number is +/- can be tested easily • E before M: simplify sorting • Represent negative by bias (not 2’s complement) for ease of sorting • [biased rep] –1, 0, 1 = 126, 127, 128 • [2’s compl.] –1, 0, 1 = 0xFF, 0x00, 0x01 • More complicated math for sorting, increment/decrement Spring 2013
Exceptions • Overflow: • ±INF: when number exceeds the range of representation • Underflow • When the number are too close to zero, they are treated as zeroes • Dwarf • The smallest representable number in the FP system • Machine Epsilon (ME) • A number with computation significance (more later) Spring 2013
Extremities More later • E : (1…1) • M (0…0): infinity • M not all zeros; NaN (Not a Number) • E : (0…0) • M (0…0): clean zero • M not all zero: dirty zero (see next page) Spring 2013
Not-a-Number • Numerical exceptions • Sqrt of a negative number • Invalid domain of trigonometric functions • … • Often cause program to stop running Spring 2013
1. 1. Extremities (32-bit) • Max: • Min (w/o stepping into dirty-zero) (1.111…1)2254-127=(10-0.000…1) 21272128 (1.000…0)21-127=2-126 Spring 2013
a.k.a.: also known as Dirty-Zero (a.k.a. denormals) • No “Implicit One” • IEEE 754 did not specify compatibility for denormals • If you are not sure how to handle them, stay away from them. Scale your problem properly • “Many problems can be solved by pretending as if they do not exist” Spring 2013
denormals R 2-126 0 dwarf Dirty-Zero (cont) 2-126 00000000 10000000 00000000 00000000 2-127 00000000 01000000 00000000 00000000 2-128 00000000 00100000 00000000 00000000 00000000 00010000 00000000 00000000 2-129 (Dwarf: the smallest representable) Spring 2013
Drawf (32-bit) Value: 2-149 Spring 2013
Machine Epsilon (ME) • Definition • smallest non-zero number that makes a difference when added to 1.0 on your working platform • This is not the same as the dwarf Spring 2013
Computing ME (32-bit) 1+eps Getting closer to 1.0 ME: (00111111 10000000 00000000 00000001) –1.0 = 2-23 1.12 10-7 Spring 2013
Effect of ME Spring 2013
Significance of ME • Never terminate the iteration on that 2 FP numbers are equal. • Instead, test whether |x-y| < ME Spring 2013
Machine Epsilon (Wikipedia) Machine epsilon gives an upper bound on the relative error due to rounding in floating point arithmetic. Spring 2013
Number density: there are as many IEEE 754 numbers between [1.0, 2.0] as there are in [256, 512] Revisit: “roundoff” error ME: a measure of real number density near 1.0 Implication: Scale your problem so that intermediate results lie between 1.0 and 2.0 (where numbers are dense; and where roundoff error is smallest) R Numerical Scaling Spring 2013
Scaling (cont) • Performing computation on denser portions of real line minimizes the roundoff error • but don’t over do it; switch to double precision will easily increase the precision • The densest part is near subnormal, if density is defined as numbers per unit length Spring 2013
How Subtraction is Performed on Your PC • Steps: • convert to Base 2 • Equalize the exponents by adjusting the mantissa values; truncate the values that do not fit • Subtract mantissa • normalize Spring 2013
1. 1110111 0100011 1010100… – Subtraction of Nearly Equal Numbers • Base 10: 1.24446 – 1.24445 Significant loss of accuracy (most bits are unreliable) Spring 2013
Theorem of Loss Precision • x, y be normalized floating point machine numbers, and x>y>0 • If then at most p, at least q significant binary bits are lost in the subtraction of x-y. • Interpretation: • “When two numbers are very close, their subtraction introduces a lot of numerical error.” Spring 2013
When you program: You should write these instead: Implications Every FP operation introduces error, but the subtraction of nearly equal numbers is the worst and should be avoided whenever possible Spring 2013
Efficiency Issues • Horner Scheme • program examples Spring 2013
Horner Scheme • For polynomial evaluation • Compare efficiency Spring 2013
Accuracy vs. Efficiency Spring 2013
Good Coding Practice Spring 2013
Storing Multidimensional Array in Linear Memory C and others Fortran, MATLAB Spring 2013
On Accessing Arrays … Which one is more efficient? Spring 2013