Floating Point Representations

Floating Point Representations CDA 3101 Discussion Session 02

Question 1 • Converting the binary number 1010 0100 1001 0010 0100 1001 0010 01002 to decimal, if the binary is Unsigned? 2’s complement? Single precision floating-point?

Question 1.1 • Converting bin (unsigned) to dec 1010 0100 1001 0010 0100 1001 0010 01002 1*231 + 1*229 + … + 1*28 + 1*25 + 1*22 = 2761050404

Question 1.2 • Converting bin (2’s complement) to dec 1010 0100 1001 0010 0100 1001 0010 01002 -1*231 + 1*229 + … + 1*28 + 1*25 + 1*22 = -1533916892

S(1) Biased Exponent(8) Fraction (23) Question 1.3 • Converting bin (Single precision FP) to dec 1010 0100 1001 0010 0100 1001 0010 01002 Sign bit : 1 Exponent : 01001001 = 73 Fraction : 00100100100100100100100 =1*2-3 + 1*2-6 + … + 1*2-15 + 1*2-18 + 1*2-21 =0.142857074 (-1)S * (1.Fraction) * 2(Exponent - 127) =(-1)1 * (1.142857074) * 2(73 - 127) =-1.142857074 * 2-54 =-6.344131187 * 10-17

Question 2 • Show the IEEE 754 binary representation for the floating-point number 0.110 in singleprecision and doubleprecision

0 01111011 10011001100110011001100 Question 2.1 • Converting 0.110 to single-precision FP Step1: Covert fraction 0.1 to binary (multiplying by 2) 0.1*2 = 0.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2,0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, …000110011… 1.10011… * 2-4 Step2: Express in single precision format (-1)S * (1.Fraction) * 2(Exponent +127) =(-1)0 * (1.10011001100110011001100) * 2(-4+127)

0 01111111011 1001100110011001100110011001100110011001100110011001 Question 2.2 • Converting 0.110 to double-precision FP Step1: Covert fraction 0.1 to binary (multiplying by 2) 0.1*2 = 0.2, 0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2,0.2*2 = 0.4, 0.4*2 = 0.8, 0.8*2 = 1.6, 0.6*2 = 1.2, …000110011… 1.10011… * 2-4 Step2: Express in double precision format (-1)S * (1.Fraction) * 2(Exponent +1023) =(-1)0 * (1.1001100110011001100110) * 2(-4+1023)

Question 3 • Convert the following single-precision numbers into decimal a. 0 11111111 0000000000000000000000 b. 0 00000000 0000000000000000000010

S(1) Biased Exponent(8) Fraction (23) Question 3.1 • Converting bin (Single precision FP) to dec 0 11111111 000000000000000000000002 Sign bit : 0 Exponent : 11111111 = Infinity Fraction : 00000000000000000000000 = 0 Infinity

S(1) Biased Exponent(8) Fraction (23) Question 3.2 • Converting bin (Single precision FP) to dec 0 00000000 000000000000000000000102 Sign bit : 0 Exponent : 00000000 = 0 Fraction : 00000000000000000000010 =1*2-22 =0.000000238 (-1)S * (0.Fraction) * 2-126 =(-1)0 * (0.000000238) * 2-126 = 2.797676555 * 10-45

Question 4 • Consider the 80-bit extended-precision IEEE 754 floating point standard that uses 1 bit for the sign, 16 bits for the biased exponent and 63 bits for the fraction (f). Then, write (i) the 80- bit extended-precision floating point representation in binary and (ii) the corresponding value in base-10 positional (decimal) system of • the third smallest positive normalized number • the largest (farthest from zero) negative normalized number • the third smallest positive denormalized number that can be represented.

Question 4.1 • The third smallest positive normalized number Bias: 215-1 = 32767 Sign: 0 Biased Exponent: 0000 0000 0000 0001 Fraction (f): 61 zeros followed by 10 Decimal Value: (-1)0*2(1-32767)*(1+2-62) = 2-32766+2-32828

Question 4.2 • The largest (farthest from zero) negative normalized number Sign: 1 Biased Exponent: 1111 1111 1111 1110 Fraction: 63 ones Decimal Value: (-1)1*2(65534-32767)*(1+2-1+2-2+…+2-63) = -232767(264-1)2-63 = -232768 (approx.)

Question 4.3 • The third smallest positive denormalized number Sign: 0 Biased Exponent: 0000 0000 0000 0000 Fraction: 61 zeros followed by 11 Decimal Value: (-1)0*2-32766*(2-62+2-63) = 3*2-32829

Floating Point Representations

Floating Point Representations

Presentation Transcript

Floating Point

Floating-Point Arithmetic

Floating Point Representation

Floating point

Floating Point

IA32 Floating Point

Floating Point

Floating Point

Floating Point

Floating Point

Floating Point

Floating point

Number Representation Part 2 Fixed-Radix Signed Representations Floating Point Representations

Number Representation Part 2 Floating Point Representations Rounding

Floating Point

Floating point

Floating Point

Floating Point

Floating Point

Floating Point

Floating Point