390 likes | 406 Views
Explore MATLAB number formats, floating-point numbers, complex numbers, errors in numerical computation, and defensive coding techniques to address precision issues. Understand machine epsilon, roundoff errors, and truncation errors.
E N D
Math.375 Fall 2005I - Numbers and Formats Vageli Coutsias
Introduction • 1 + 1 = 0 or “machine epsilon”? • » eps = 2.220446049250313e-016 How does matlab produce its numbers? • Where we learn about number formats, truncation errors and roundoff
Matlab real number formats » format long %(default for ) pi = 3.14159265358979 » format short pi = 3.1416 » format short e pi = 3.1416e+000 » format long e pi = 3.141592653589793e+000
Floating-point numbers • Base or radix • t Precision • [L,U] Exponent range
The set F of f.p. numbers • Basis b • Significant digits t • Range (U, L) • F(b, t, U, L): F(2, 53, -1021, 1024) is the IEEE standard
sign bit mantissa (52 bits) (it is assumed d1= 1, so only need d2,…d53) exponent (11 bits)
IEEE double precision standard If E=2047 and F is nonzero, then V=NaN ("Not a number") If E=2047 and F=0 and S=0,(1) then V=Inf, (-Inf) If E=0 and F=0 and S=0,(1), then V=0,(-0) If 0<E<2047 then V=(-1)**S * 2 ** (E-1023) * (1.F) where "1.F" denotes the binary number created by prefixing F with an implicit leading 1 and a binary point. If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-1022) * (0.F) These are "unnormalized" values.
contains “unnormalized” values, as allowed, e.g., in IEEE standard
Floating point numbers The machine precision is the smallest nuber such that:
Machine epsilon • The distance from 1 to the next larger float • Gives the relative error in representing a real number in the system F: (RELATIVE) ROUNDOFF ERROR
Machine epsilon computed a = 1;b = 1; while a+b ~= a b = b/2; end b % b = 1.110223024625157e-016 % shows that a+b = a is satisfied by % numbers b not equal to 0 % here b = eps/2 is the largest such % number for a = 1
Overflow does not only cause programs to crash! Arianne V’s short maiden flight on 7/4/96 was due to a floating exception.
FLOAT INTEGER • During the conversion of a 64-bit floating-point number to a 16-bit signed integer • Caused by the float being outside the range representable by such integers • The programming philosophy employed did not guard against software errors-a fatal assumption!
COMPLEX NUMBERS • z = x + i*y • x = Re(z) is the real part • y = Im(z) is the imaginary part • i^2 = -1 is the imaginary unit • polar form • complex conjugate
Matlab commands: >> z = 3+i*4 >>% Cartesian form: x = real(z); y = imag(z) >>% Polar form: theta = angle(z); rho = abs(z) So: z = abs(z)*(cos(angle(z)) + i*sin(angle(z))) x-i*y = conj(z)
The complex plane z=[1+2*i, 3,-1+i, -i]; compass(z,’r’) defines an array of complex numbers which are plotted as vectors in the x-y plane
Roots of complex numbers » x = -1 ; x^(1/3) ans = 0.5000 + 0.8660i Matlab assumes complex arithmetic, and returns automatically the root with the smallest phase angle ( the other two roots are -1 and 0.5000 - 0.8660i
Types of errors in numerical computation • Roundoff errors Pi = 3.14159 Pi = 3.1415926535897932384626 • Truncation errors Cos x = 1 – x^2/2 Cos x = 1 – x^2/2 + x^4/4! Errors usually accumulate randomly (random walk) But they can also be systematic, and the reasons may be subtle!
x = linspace(1-2*10^-8,1+2*10^-8,401); f = x.^7-7*x.^6+21*x.^5-35*x.^4+35*x.^3-21*x.^2+7*x-1; plot(x,f) CANCELLATION ERRORS
g = -1+x.*(7+x.*(-21+x.*(35+x.*(-35+x.*(21+x.*(-7+x)))))); plot(x,g)
h = (x-1).^7; plot(x,h)
z(1) = 0; z(2) = 2; for k = 2:29 z(k+1) = 2^(k-1/2)*(1-(1-4^(1-k)*z(k)^2)^(1/2))^(1/2); end semilogy(1:30,abs(z-pi)/pi)
Instability is inescapable; But one can learn to ride it!
Due to limitations in computer arithmetic, need to practice “defensive coding”. • Mathematicsnumerics + implementation • The sequence of computations carried out for the solution of a mathematical problem can be so complex as to require a mathematical analysis of its own correctness. • How can we determine efficiency? Text
Vocabulary • Consistency – correctness - … • Convergence (rate) • Efficiency (operation counts) • Numerical Stability (error amplification) • Accuracy (relative vs. absolute error) • Roundoff vs. Truncation
for k=1:50, term = x*term/k; s = s+term;err(k) = abs(f(k) - s);end relerr = err/exp(x); semilogy(1:nTerms,relerr)% semilogy(1:nTerms,err) ylabel('Relative Error in Partial Sum.') xlabel('Order of Partial Sum.') title(sprintf('x = %5.2f',x)) figure semilogy(1:nTerms,err)%end term = 1; s = 1; f = exp(x)*ones(nTerms,1); for k=1:50, term = x*term/k; s = s+term; err(k) = abs(f(k) - s); end relerr = err/exp(x); semilogy(1:nTerms,relerr)% semilogy(1:nTerms,err) ylabel('Relative Error in Partial Sum.') xlabel('Order of Partial Sum.') title(sprintf('x = %5.2f',x)) figure semilogy(1:nTerms,err)%end The first monster: T(x) = e^x * log(1+e^-1) For x>30 a dramatic deviation from the theoretical value 1 is found. For x>36.7 the plotted value drops (incorrectly) to zero Here: x = linspace(0,40,1000); y = exp(x).*log(1+exp(-x));
Summary • Roundoff and other errors • Multiple plots • Formats and floating point numbers
References • C. Essex, M.Davidson, C. Schulzky, “Numerical Monsters”, preprint. • Higham & Higham, Matlab Guide, SIAM • SIAM News, 29(8), 10/98 (Arianne V failure) • B5 Trailer; http://www.scifi.com/b5rangers/ • As usual, find m-files at aix: ~coutsias/375/IV