700 likes | 928 Views
Floating Point in computers. Comply with standards: IEEE 754 ISO/IEC 559. Timeline. Introduction quite short Binary review not so long Integer Arithmetic 1/3 Floating Point 1/3 Floating Point Arithmetic 1/3 Other issues extra short. Introduction.
E N D
Floating Point in computers Comply with standards: IEEE 754 ISO/IEC 559
Timeline • Introduction quite short • Binary review not so long • Integer Arithmetic 1/3 • Floating Point 1/3 • Floating Point Arithmetic 1/3 • Other issues extra short
Introduction • Who does computer arithmetic? • Intel’s spare money • How is it done in hardware? • How Integer relates to Floating point • Now, we go back to “computer structure”
Binary numbers • What is 1 0 0 1 0 1 1 . 0 0 1 0 1 ? 64 8 2 1
Signed Binary Integers • Sign-magnitude • 2’s complement • 1’s complement • biased
Sign-Magnitude • High order bit = Sign • 0101 = 5 • 1101 = -5 • 2 zero’s
2’s complement • Number + Negative = 2n • 0101 = 5 • 1011 = -5 • Easy addition (drop carry) • Formula: -an-12n-1 + an-22n-2 + … +a121 + a0
1’s Complement • Negative - complement to 1 • 0101 = 5 • 1010 = -5 • 2 zero’s • Number + Negative = 2n-1
Biased • Binary = Number + Bias • Bias = 5: 1101 = 5 5+5=10 0000 = -5 (-5)+5 = 0 • Relative order remains
Adding (usigned) Integers • Elementry school : 1 1 0 0 1 1 0 1 1 0 0 0 0 1 1 0 1 1 1 + 1 0 1 0 1 0 0 1 1 • Result has n+1 bits!
a b a b Cout Cout Cin s s Adding Integers - hardware Full Adder Half Adder 2 logical levels
a0 b0 a1 b1 an-1 bn-1 an-2 bn-2 Cin Cout s0 s1 sn-1 sn-2 Ripple carry Adder • Slow - 2n logical levels • Small constant (CMOS) • Other ways exist
Adding Signed Integers • In 2’s complement: b + (-a) = b + (2n-a) = 2n + (b-a) (-b) + (-a) = (2n-b)+(2n-a) = (2n - (b+a)) + 2n • hence - add as integers, discard carry out • Example: 0011 + 1100 = ?
Substracting Integers • Add the negation • Negating 2’s complement: 11010100101011000110000 = ? 001010110101001110 1 0000
Integer (unsigned) Multiplication • Elementry school : 1 1 0 1 * 1 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 0 1 0 1 • Result is 2n bits !
Shift Carry P A n n B n Hardware Multiplier • P=0 • loop: (i) if A0=1, add B to P (ii) right-shift P & A
Integer (unsigned) Division • Elementry school : 0 1 0 0 11 1101 00 Result: 0100, Rem 1 Dec: 13/3=4, Rem 1 011 11 000 00 001 00 01
Hardware Divider Shift P A n+1 n B 0 n+1 • P=0 • loop: (i) left-shift P & A (ii) Sub. B from P: positive: a0=1 negative: a0=0, restore P (add B)
Example • 13 / 3 = 4 (1) • n=4 • A=1101 B=00011 P=00000
P A B 0 0 0 0 0 1 1 0 1 0 0 0 1 1
P A B 0 0 0 0 1 0 1 0 0 0 0 0 1 1 Remainder Quotient
Division - remarks • Non-restoring Algorithm • Load P only if positive • Check for 0 • (Total) Result is 2n bits!
Integer arithmetic - remarks • Signed Multiply and Division • Algorithms exist • We will not use them • What to do with extra bits? • Faster methods
Non Integers - Other Methods • Fixed Point • example: # # # . # • Binary point shifted • Integer arithmetic (extra shifting) • Small number magnitude • Rational • a/b (a,bZ)
Floating Point • Exponent + Significand (= Mantisa) • x = s • 2e • Example: s=101 e=011 x = 101 • 211 = 5 • 23 = 40 = 101000
Uniqueness • Denormal Numbers: 123.456 107 0.123 104 • Normalized: #.### 10# 1.123 104 • What about 0 ?
Floating Point Standard • Why Standartize? • Hardware accelerators • Software compatibility • Build Software Libraries • etc….. • IEEE 754-1985 ISO/IEC 559 • Includes: Structure, Arithmetic results
Float Types • 4 Precision Types: • Single • Single extended • Double • Double extended
Single Precision • 32 bits: • Exponent (e): Biased ( + 127) • Significand (f): Fixed fraction: 0 . # # # … • Nuber: 1.f • 2e-127 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Sign(1) Exponent(8) Significand(23)
Single Precision - Example • 1 10000001 01000000000000000000000 • 10000001 = 129 129-127=2 1.01= 1.25 • 01000… = 0.01000… • X = - 1.25 • 22 • X = - 5
Single Precision - Range • Emax = 127 (e = 254) • Emin = -126 (e = 1) • Why |Emin|<|Emax|? • 1/2Emin does not overflow • Why Biased notation? • What about 0 and 255 ?
Exmaples • We shall use base 10 sometimes: • f will have 3 digits • Emax will be 98 • Emin will be -97 • Ex: 5.341070
NaN • Not a Number • Result of ilegal computation: • Any computation involving a NaN • e = Emax + 1 & f 0 • # 11111111 ####################### • Many NaN’s (different f’s)
NaN’s in use • Zero finder outside domain • f(x) = sqrt(x) - 1 • Works since all computations NaN • No exception caused !
Zero’s • 0 00000000 00000000000000000000000 ? • this is NOT 1.02Emin • 1 00000000 00000000000000000000000 ? • 0 is signed! 0 both exits! • What is the difference?
Signed 0’os • +0 = -0 BUT: • Multiply/Divide keep sign rules: • Monivation: • Using inf correctly (describe later) • log(x) : log(0)=-inf log(negative)=Nan log(x) if x(-0) ?
± inf • More logic: • e = Emax + 1 & f =0 • # 11111111 00000000000000000000000
Inf usage Example (If tan-1 is defined properly)
More on 0’os and inf’s • General Rule for 0/inf arithmetic: • Take appropriate limit: • 1/(1/x) where x=0 or inf • Why not Max # instead?
Zero’s and inf’s - yet again • X/(x2+1) is bad! Why? • 1/(x+x-1) is better • Do we need to check for x=0? • Using 2 zero’s and inf’s saves some special cases checks.
Denormalized numbers • Example: • x=1.23•10-98y=1.11•10-98 • x-y = 1.20•10 -99 = 0 • so: x-y=0 but: x y • think of: if(x y) then z=1/(x-y) • Soluition: • use denormalized numbers!
Denormal Numbers • Smallest normal: 1.0 • 2Emin • Below, use denormal: 0.f • 2Emin • e = Emin - 1 & f 0 • # 00000000 ####################### • Gradual underflow: 1.23 • 10-4 ( /10 ) 0.12 • 10-4 ( /10 ) 0.01 • 10-4 ( /10 ) 0
Denormal Numbers • Back to our Example: • x=1.23•10-98y=1.11•10-98 • x-y = 0.12•10 -98 • and this is not 0 !
Flush to 0 Vs Gradual Underflow 2-2 2-1 0 2-4 2-3 2-2 2-1 0 2-4 2-3
Special Values - Summary ExponentFractionRepresents Emin-1 f=0 0 Emin-1 f0 0.f2Emin Emin e Emax ---- 1.f2e Emax+1 f=0 0 Emax+1 f0 0.f2Emin
Rounding • Why is rounding needed? • Infinit numbers Finit representation • Integers only overflow • Almost all operations need rounding • IEEE - specifies algorithms for arithmetic
Numbers need rounding • Out of range: • x>22Emax x<12Emin • Between 2 floats: • 0.110 = 0.00011001100….2 = 1.1001100…. 2-4 • 1.10012-4