350 likes | 527 Views
Representing fractions – Fixed point. The problem: How to represent fractions with finite number of bits ?. Representing fractions – Fixed point. A number with 10 bits. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 a 10. Representing fractions – Fixed point. A number with 10 bits.
E N D
Representing fractions – Fixed point • The problem: • How to represent fractions with finite number of bits ?
Representing fractions – Fixed point A number with 10 bits a1a2a3a4a5a6a7a8a9a10
Representing fractions – Fixed point A number with 10 bits a1a2a3a4a5a6a7a8a9a10 a1a2a3a4a5a6a7a8.a9a10 Fixing the point
Representing fractions – Fixed point Range of representation:
Fixed point : the problem • Cannot represent wide ranges of numbers. • In scientific applications.
Representing Fractions – Floating point 1 * 101 10 Base (radix) - r -1.23 * 10-2 -0.123
Representing Fractions – Floating point 1 * 101 10 exponent -1.23 * 10-1 -0.123
Representing Fractions – Floating point 1 * 101 10 Number (Mantissa) -1.23 * 10-1 -0.123
Representing Fractions – Floating point (-1)0*1 * 101 10 Sign bit (-1)1*1.23 * 10-1 -0.123
Problem of uniqueness 100*10-4 0.1 Representation is not Unique 0.001*102
Problem of uniqueness - Normalization 610*10-4 0.61 6.1*10-1 Standardization One digit to the Left of the point 0.0061*102
Normalized Binary Floating point D = (-1)a0 * (1.a1a2a3…)*2b1b2b3… a0b1b2…bna1a2a3…am String of bits
Floating point - Questions • Representing the (signed) exponent • How to represent zero? • And Nan, infinity ? • How to add, subtract and multiply? • Rounding Errors.
Floating point – Representing the exponent How to represent singed number ? Sign bit 2-Complement
Floating point – Representing the exponent How to represent singed number ? Sign bit Neither 2-Complement
Floating point – Representing the exponent • We want the exponent to be binary ordered: 0000 < 0001 < …. < 1000 < … < 1111
Floating point – Representing the exponent Number = Number - B Usually B = 2n-1-1 We define the following sizes like this: emin 000…0001 emax 111…1110
Floating point – Representing zero,NAN, ± IEEE754 special values Denormalized number normalized number
IEEE 754 (Including the sign Bit)
What is NaN (not a number) Partial list
Infinity • Provide a safe was to continue calculation when overflow is encountered.
Calculations with Floating Point numbers • Addition: • Equalize the exponents (smallerlarger exponent) • Sum the mantissa • Renormalize if necessary
Calculations with Floating Point numbers • Example (in base 10): |E| = 1 , |M| = 3 91 9.10*101 9.7 9.70*100
Calculations with Floating Point numbers 9.10*101 + 9.70*100 Not The same Order.
Calculations with Floating Point numbers 9.10*101 + 0.97*101 9.10*101 + 9.70*100 10.7*101 renormalize 1.07*102
Calculations with Floating Point numbers • Example II (in base 10): |E| = 1 , |M| = 3 91 9.10*101 9.75 9.75*100
Calculations with Floating Point numbers 9.10*101 + 9.75*100 Not The same Order.
Calculations with Floating Point numbers 9.10 *101 + 0.975*101 9.10*101 + 9.75*100 10.75*101 renormalize 5 (rounding error) 1.07*102
Rounding Errors The Problem: Squeezing infinite many real numbers into a finite number of bits
Measuring Rounding Errors • Units in last place (Ulps) • Relative Error
Measuring Rounding Errors – ULP p digits If d.dddd*re represent z error = |d.dddd – (z/re)|*rp-1
Measuring Rounding Errors – ULP Example I: r = 10 , p = 3 The number 3.14*10-2 represents 0.0314159 Error = 0.159
Measuring Rounding Errors – ULP • What is the maximum ULP if the rounding is toward the nearest number? 0.5 ULP
Measuring Rounding Errors – Relative Error p digits If d.dddd*re represent z Relative error = |d.dddd*re – z|/z
Measuring Rounding Errors – Relative errors Example I: r = 10 , p = 3 The number 3.14*10-2 represents 0.0314159 Relative Error ~ 0.0005