Floating Point

Floating Point Representation and Arithmetic (see Patterson Chapter 4)

Outline • Review of floating point scientific notation • Floating point binary • IEEE Floating Point Standard • Addition in Floating Point • Remarks about multiplication

Floating Point Notation • Decimal • 12.4568ten (decimal notation) means • 10*1 + 2 + 4/10 + 5/100 + 6/1000 + 8/10000 • In scientific notation • 12.4568 = • 124568 * 10-4 = 1245680 * 10-5 = • 12456.8 * 10-3 = 1245.68 * 10-2 = • 124.568 * 10-1 =12.4568 * 100 • 1.24568 * 101 • 1.24568*101 is an example of normalised scientific notation.

Floating Point in Binary • Binary • 0.010011two = (0/2) + (1/22) + (0/24) +(1/25) + (1/26) • 0 + 1/4 + 0 + 1/32 + 1/64 = • (0.25 + 0.03125 + 0.015625)ten = • 0.296875ten • In scientific notation • 10011*2-6 = 1001.1*2-5 = • 100.11*2-4 • 1.0011*2-2normalised

Normalised Notation • In normalised binary scientific notation • unless the number is 0 • always have 1.sssssss...sss * 2E • sss...sss is the significand • E is the exponent • The significand s1s2...sn represents

Representation • Note that it is impossible to exactly represent all decimal numbers in this way (eg 0.3) • Problem of representation of floating point numbers in fixed word length • need to represent • sign • significand • exponent • in one word (32 bits).

31 22 30 23 0 sign bit S exponent 8 bits E significand: 23 bits F Representation • Represents floating point number: • (-1)S * (1.0+F) * 2E • S is 1 bit (if S=1 then negative) • F is 23 bits • E is 8 bits

Squeezing out More from the Bits • Since every non-zero binary f.p. number (normalised) is of the form: • 1.sss...sss *2E • We do not have to represent explicitly the 1 in the word, and can therefore interpret the bit-pattern as: • (-1)S (1 + significand) * 2E • thus ‘reclaiming’ an extra bit! • E= 0000 0000 is reserved for zero.

Requirements • As far as possible the ALU should be able to reuse integer machinery in implementation of f.p. • Eg, comparison with zero • easy because of sign bit • fp numbers can be easily classified as negative, zero or positive without additional hardware. • Comparison of two fp numbers x<y not so straightforward - • how are negative exponents to be formed?

0 1111 1111 0000.... 0000 significand S E 0 0000 0001 0000.... 0000 significand S E Bad Example: (1/2) > 2 ??? • Representation of 1/2 is • 0.1two = 1.0*2-1 (normalised) • Representation of 2 is • 10two = 1.0*21 (normalised)

1111 1111 1111 1110 ....... 0111 1111 0111 1110 ... 0000 0000 Representation of Exponent • Inappropriate to use two’s complement for the exponent • Ideally want 0000 0000 to represent most negative number, 1111 1111 most positive. • Number range: positive use this for 20 negative 0111 1111 = 127ten

Biased Representation(IEEE FP Standard) • The ‘bias’ 127 represents 0 • 128 to 255 represent positive exponents • 1 to 127 represent negative exponents • (remember 0 is reserved for the entire number being zero). • The actual exponent is therefore: • E - bias • (-1)S * (1 + significand) * 2E-bias

Example 1 • Represent 0.3125ten = 5/16 • 5/16 = 1/4 + 1/16 = 0.0101two = 1.01*2-2 • S = 0 • E = ??? • -2 = E-bias = E-127 • E = 125ten = 0111 1101two • Significand = 010.…000 • 0 0111 1101 010000...000

Example 2 • What does • 0 0111 1101 010000...000 • represent? • S = 0 • E = 0111 1101 = 125ten • Exponent = E-bias = 125-127 = -2 • Significand = 1/4 • (-1)S(1+sig.)2E-bias = (1 + 1/4)*(1/4) = 5/16

Addition of FP Numbers • Given two numbers: • normalise them both • adjust the floating point of the smaller number to match the larger one • Add them together • renormalise • check for underflow/overflow of exponent • if so then break; • round significand to required number of bits • might need renormalisation (eg, 11111 round to 4 bits).

Addition Example • 0.5 + 2.75 = 3.25 • 0.1two + 10.11two • 1.0*2-1 + 1.011*21 • 0.010*21 + 1.011*21 • 1.101*21 (already normalised) • (1 + (1/2) + (1/8)) * 2 • 3.25

Remarks • The IEEE FP standard represents floats in 32 bits, higher precision represented across two words (doubles). • Multiplication is relatively easy, since the exponents add, and the significands can be done with integer multiplication. • There can be huge pitfalls in reliably transferring floating point code to different hardware!

Summary • FP scientific notation • normalised representation in binary • Bias to represent -ve to +ve range in exponent • Addition • Notice how a 32-bit binary string can represent many different entities in memory. • Memory architectures NEXT.

Floating Point

Floating Point

Presentation Transcript

Floating Point

Floating-Point Arithmetic

Floating Point Representation

Decimal Floating Point

Floating point

Floating Point

IA32 Floating Point

Floating Point

Floating Point

Floating Point

Floating Point

Floating Point

Floating point

Floating Point

Floating point

Floating Point

Floating Point

Floating Point

Floating-Point Representation

Floating Point

Floating Point