1 / 143

Foundations of Computer Arithmetic

Understand general number representation and introduction to bases 2, 8, and 16 used in computer arithmetic. Learn about decimal notation, floating-point representations, and error analysis in computation methods.

wilsong
Download Presentation

Foundations of Computer Arithmetic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 551 Computational Methods 2019/2020 Fall Chapter 2 Error Analysis and Computer Arithmetic

  2. Outline Base Changes Introduction to Error Analysis Floating-Point Representations

  3. References • W. Cheney, D Kincaid, Numerical Mathematics and Computing, 6ed, • Chapter 1 • Chapter 2 • Appendix B

  4. Introduction • general number representation - • to bases 2, 8, and 16 • bases primarily used in computer arithmetic • The familiar decimal notation for numbers uses the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. • a whole number such as 37294 • individual digits represent coefficients of powers of 10: • We begin with a discussion of general number representation but move quickly to bases 2, 8, and 16, as they are the bases primarily used in computer arithmetic. • The familiar decimal notation for numbers uses the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. • When we write a whole number such as 37294, the individual digits represent coefficients • of powers of 10 as follows: • 37294 = 4 + 90 + 200 + 7000 + 30000 • = 4 × 100 + 9 × 101 + 2 × 102 + 7 × 103 + 3 × 104 • Thus, in gener

  5. in general, a string of digits represents a number according to the formula anan−1. . . a2a1a0 = a0 × 100 + a1 × 101 +· · ·+an−1 × 10n−1 + an× 10n • This takes care of only the positive whole numbers.A number between 0 and 1 is represented • by a string of digits to the right of a decimal point. For example, we see that 0.7215 = 7 × 10−1 + 2 × 10 −2 + 1 × 10 −3 + 5 × 10 −4

  6. In general, we have the formula • there can be an infinite string of digits to the right of the decimal point; indeed, • there must be an infinite string to represent some numbers. For example, we note that

  7. For a real number of the form • the integer part is the first summation • the fractional part is the second summation • a number represented in base β is signified by • enclosing it in parentheses and adding a subscript β

  8. β Base Numbers • Other bases used, especially in computers e.g., • the binary system uses 2 as the base • the octal system uses 8 • the hexadecimal system uses 16 • In the octal representation of a number • digits - 0, 1, 2, 3, 4, 5, 6, 7 • e.g., (21467)8 = 7 + 6 × 8 + 4 × 82 + 1 × 83 + 2 × 84 = 7 + 8(6 + 8(4 + 8(1 + 8(2)))) = 9015 • A number between 0 and 1, expressed in octal • represented with combinations of 8−1, 8−2, and so on. (0.36207)8 = 3 × 8−1 + 6 × 8−2 + 2 × 8−3 + 0 × 8−4 + 7 × 8−5 = 8−5(3 × 84 + 6 × 83 + 2 × 82 + 7) = 8−5(7 + 82(2 + 8(6 + 8(3)))) = 15495 / 32768 = 0.47286 987 . . .

  9. If we use another base, say, β, then numbers represented in the β-system look like this: • The digits: 0, 1, . . . , β −2, β −1 • If β > 10 • necessary to introduce symbols for 10, 11, . . . , β − 1 • The separator between the integer and fractional part - called the radix point • decimal point - base-10 numbers

  10. Conversion of Integer Parts • formalize the process of converting a number from one base to another • consider separately • the integer and fractional parts of a number • a positive integer N with base γ : • to convert this to the number system with base β Write N in its nested form:

  11. replace each of the numbers on the right by its representation in base β • Next, carry out the calculations in β-arithmetic. replacement of the ak’s and γ by equivalent base-β numbers - a table • how each of the numbers 0, 1, . . . , γ −1 appears • in the β-system • a base-β multiplication table may be required.

  12. decimal number 3781 to • binary form • the decimal binary equivalences • longhand multiplication in base 2, • for hand calculations: • Write down an equation digits c0, c1, . . . , cm:

  13. if N is divided by β, then the remainder in this division is c0, and the quotient is • If this number is divided by β, the remainder is c1, and so on • divide repeatedly by β • saving remainders c0, c1, . . . , cmand quotients.

  14. Example • Convert the decimal number 3781 to binary form using the division algorithm. • Solution: divide repeatedly by 2, saving the remainders

  15. Here, the symbol ˙↓ is used to remind us that the digits ci are obtained beginning with the • digit next to the binary point. Thus, we have • (3781.)10 = (111 011 000 101.)2 • and not the other way around: (101 000 110 111.)2 = (2615)10

  16. Example • Convert the number N = (111 011 000 101)2 to decimal form by nested multiplication. • Solution:

  17. Another conversion problem exists in going from an integer in base γ to an integer in base β • when using calculations in base γ • the unknown coefficients • determined by a process of successive division • this arithmetic is carried out in the γ –system • At the end, the numbers ckare in base γ • a table of γ -β equivalents

  18. e.g., convert a binary integer into decimal form by repeated division by (1 010)2 • equals (10)10 • carrying out the operations in binary • A table of binary-decimal equivalents • binary division is easy only for computers • develop alternative procedures

  19. Conversion of Fractional Parts • convert a fractional number such as (0.372)10 to binary • a direct yet naive approach:

  20. Dividing in binary arithmetic is not straightforward • easier ways conversion. • x in the range 0 < x < 1 and that the digits ck in the representation • are to be determined

  21. it is necessary to shift the radix point only when multiplying by base β • the unknown digit c1 can be described as the integer part of βx • denoted by I(βx). • The fractional part, (0.c2c3c4. . .)βdenoted by F(βx) • The process is repeated in the • following pattern: • the arithmetic is carried out in the decimal system.

  22. Example • Use the preceding algorithm to convert the decimal number x = (0.372)10 to binary form.

  23. repeatedly multiplying by 2 and removing the integer parts: • (0.372)10 = (0.010 111 . . .)2

  24. Base Conversion 10↔8↔2 • Most computers - binary system representation of numbers. • The octal system (base 8) useful in converting from the decimal system to the binary system and vice versa • With base 8, the positional values of the numbers • 80 = 1, 81 = 8, 82 = 64, 83 = 512, 84 = 4096,...

  25. Example

  26. converting between decimal and binary form • convenient - octal representation - intermediate step • Conversion between • octal and decimal • octal and binary – simple • starts at the binary point and proceeds in both directions. (101 101 001.110 010 100)2 = (551.624)8 • Conversion of an octal number to binary can be done in a similar manner but in reverse order. (5362.74)8 = (101 011 110 010.111 100)2

  27. Example • What is (2576.35546 875)10 in octal and binary forms? • Solution: convert the decimal number first to octal and then to binary • For the integer partrepeatedly divide by 8: 2576. = (5020.)8 = (101 000 010 000.)2

  28. For the fractional part - repeatedly multiply by 8 0.35546 875 = (0.266)8 = (0.010 110 110)2 • the result 2576.35546 875 = (101 000 010 000.010 110 110)2

  29. Base 16 • hexadecimal system (base 16) • A, B, C, D, E, and F represent 10, 11, 12, 13, 14, and 15, respectively • table of equivalences:

  30. Conversion between binary numbers and hexadecimal numbers • regroup the binary digits to groups of four (010 101 110 101 101)2 = (0010 1011 1010 1101)2 = (2BAD)16 • and (111 101 011 110 010.110 010 011 110)2 = (1010 1111 0010.1100 1001 1110)2 = (7AF2.C9E)16

  31. More Examples • convert (0.276)8, (0.C8)16, and (492)10 into different number systems

  32. Significant Digits • digits beginning with the leftmost nonzero digit and ending with the rightmost correct digit, including final zeros that are exact.

  33. Example • solving for the variable y in this linear system of equations in two variables 0.1036 x + 0.2122 y = 0.7381 0.2081 x + 0.4247 y = 0.9327 • First, carry only three significant digits of precision in the calculations • Second, repeat with four significant digits throughout • Finally, use ten significant digits.

  34. Solution • first task - round all numbers in the original problem to three digits • round all the calculations, keeping only three significant digits • take a multiple α of the first equation and subtract it from the second equation to eliminate the x-term in the second equation • The multiplier is α = 0.208/0.104 ≈ 2.00 • in the second equation, - new coefficient of the x-term: 0.208 − (2.00)(0.104) ≈ 0.208 − 0.208 = 0 • new y-term coefficient: 0.425 − (2.00)(0.212) ≈ 0.425 − 0.424 = 0.001 • righthand side: 0.933 − (2.00)(0.738) = 0.933 − 1.48 = −0.547 y = −0.547/(0.001) ≈ −547.

  35. keep four significant digits: • the multiplier: α = 0.2081/0.1036 ≈ 2.009 • In the second equation - new coefficient of the x-term: 0.2081 − (2.009)(0.1036) ≈ 0.2081 − 0.2081 = 0 • new coefficient of the y-term: 0.4247 − (2.009)(0.2122) ≈ 0.4247 − 0.4263 = −0.00160 0 new right-hand side: 0.9327−(2.009)(0.7381) ≈ 0.9327−1.483 ≈ −0.5503 y = −0.5503/(−.00160 0) ≈ 343.9 shocked to find that the answer has changed • from −547 to 343.9, which is a huge difference!

  36. carry ten significant decimal digits • find that: • even 343.9 is not accurate • obtain: y = 356.29071 99 • The lesson learned: • data thought to be accurate should be carried with full precision and not be rounded off prior to each of the calculations

  37. In most computers, the arithmetic operations • a double-length accumulator • twice the precision of the stored quantities • may not avoid a loss of accuracy! • Loss of accuracy • roundoff errors • subtracting nearly equal numbers

  38. Figure - geometric illustration of what can happen in solving two equations in two unknowns • The point of intersection of the two lines - exact solution • dotted lines - degree of uncertainty • from errors in the measurements or roundoff errors. • sharply defined point v.s. small trapezoidal • area containing many possible solutions. • if the two lines are nearly parallel • area of possible solutions can increase dramatically! • well-conditioned and ill-conditioned systems of linear equations

  39. In 2D, wellconditione an ill-conditionedlinear systems

  40. Errors: Absolute and Relative • α,β - two numbers • one is regarded as an approximation to the other • The error of β as an approximation to α:α − β; • the error – the exact value minus the approximate value • The absolute error of β as an approximation to α: |α −β| • The relative error of β as an approximation to α: |α −β|/|α| • in absolute error, the roles of α and β are the same, • in computing the relative error, • relative error is undefined in the case α = 0.

  41. relative error is usually more meaningful than the absolute error • e.g., α1 = 1.333, β1 = 1.334 • α2 = 0.001, β2 = 0.002 • absolute error of βias an approximation to αi: • the same in both cases - 10−3 • However, the relative errors: (3/4) × 10−3 and 1, • respectively • relative error clearly indicates that • β1 is a good approximation to α1 • but that β2 is a poor approximation to α2

  42. In summary • the exact value - the true value • A useful way to express the absolute error and relative error - to drop the absolute values: (relative error)(exact value) = exact value − approximate value approximate value = (exact value)[1 + (relative error)] • relative error - related to the approximate value rather than to the exact value • the true value may not be known

  43. Example • Consider x = 0.00347 rounded to x_head = 0.0035 and y = 30.158 rounded to y_head = 30.16 • What are the number of significant digits, absolute errors, and relative errors? • Interpret the results.

  44. Solution • Case 1. x_head = 0.35 × 10−2 - two significant digits, • absolute error: 0.3 × 10−4 • relative error 0.865 × 10−2 • Case 2. y_head = 0.3016 × 102 - four significant digits • absolute error: 0.2 × 10−2 • relative error 0.66 × 10−4. • the relative error is a better indication of the number of significant digits than the absolute error

  45. Accuracy and Precision • Accurate to n decimal places • can trust n digits to the right of the decimal place • Accurate to n significant digits • can trust a total of n digits as being meaningful beginning with the leftmost nonzero digit.

  46. a ruler graduated in millimeters to measure lengths • The measurements will be accurate to one millimeter, or 0.001 m • three decimal places written in meters • A measurement such as 12.345 m would be accurate to three decimal places • A measurement such as 12.34567 89 m would be meaningless, since the ruler produces only • three decimal places • and it should be 12.345 m or 12.346 m. If the measurement 12.345 m has five dependable digits • then it is accurate to five significant figures. • a measurement such as 0.076 m has only two significant figures.

  47. using a calculator or computer in a laboratory experiment, one may get a false sense of having higher precision than is warranted by the data • e.g., • (1.2) + (3.45) = 4.65 • only two significant digits of accuracy • because the second digit in 1.2 may be • the effect of rounding 1.24 down or rounding 1.16 up to two significant figures • Then the left-hand side - as large as (1.249) + (3.454) = (4.703) • or as small as (1.16) + (3.449) = (4.609)

  48. In Addition and Subtraction • In adding and subtracting numbers • the result is accurate only to the smallest number of significant digits • used in any step of the calculation • In the above example, the term 1.2 has two significant digits; • therefore, the final calculation has an uncertainty in the third digit

  49. Rule of Thumb • In multiplication and division of numbers • the results may be even more misleading. • e.g., computations on a calculator: (1.23)(4.5) = 5.535 (1.23)/(4.5) = 0.27333 3333. • there are four and nine significant digits inhe results • but there are really only two • As a rule of thumb • keep as many significant digits in a sequence of calculations as there are in the least accurate number involved in the computations.

More Related