410 likes | 416 Views
Learn about the binary and decimal representation of floating point numbers, how to convert between the two, and the IEEE floating point representation. Understand fixed point notation and the use of exponent, significand, and bias. Discover special values, denormalized numbers, and the challenges of numerical analysis in working with floating point numbers.
E N D
CS 301 Fall 2002Floating Point Slide Set 10
The Binary Point • Well, ok, we usually call it a decimal point, but it works in base 2 (binary) also. • Just like 103.57 is really1*102 + 0*101 + 3*100 + 5*10-1 + 7*10-2101.101 is really1*22 + 0*21 + 1*20 + 1*2-1 + 0*2-2 + 1*2-3= 4 + 1 + 1/2 + 1/8 = 5 5/8 = 5.625 • So converting from binary to decimal is the same
Converting Decimals to Binary • Consider the binary number 0.abcde. If we multiply by two, we get a.bcde, which tells us the first digit. If we remove that digit, and multiply by two again, we get b.cde, and so on.
Example: Converting 1/3 to Binary 1/3 doubled is 2/3: 0.0 2/3 doubled is 1 1/3: 0.01 and now it repeats. Answer is 0.01 Try finding 2/3. Hint: Convert the binary for 1/3.
Fixed Point Notation • Similar to how you would handle money if you had to deal with only integer amounts: Count cents. In other words, pretend there is a decimal (or binary) point in your number. • Things to be careful about • Make sure you remember the hidden decimal point. If you are thinking of two decimal places, then you need to write 410 = 1002 when you mean 1. • After multiplication you need to shift right to get the (pretend) decimal point back where it belongs. Similarly, you need to shift left after division.
Floating Point Notation • Like scientific notation, but in binary. • 1.01010101010101010101 * 2100 • The exponent 100 is in binary also. This is really 10101.0101010101010101 which is about 21 1/3 (but not exactly!) • A normalized floating point number has the form 1.ssssssssssssss * 2eeeeeewhere 1.ssssssssssssss is the significand and eeeeee is the exponent.
IEEE floating point representation • Used on most (but not all) computers as the hardware representation. Intel’s math coprocessors (built in to all CPU’s since Pentium) use it. IEEE defines single precision (used by float in C) and double precision (used by double in C). • Intel’s math coprocessor also uses a third, higher precision called extended precision. In fact, everything the math coprocessor does is done with extended precision.
IEEE single precision • Accurate to around 7 decimal digits. • s is the sign bit. 0 for positive, 1 for negative. • e (8-bit) is the biased exponent = true exponent + 7F. The values 00 and FF have special meaning. • f (23-bit) is the fraction – the first 23 bits after the 1 in the significand.
Example: 5.8 is what in IEEE single precision floating point? • 5.8 = 101.1100 • To 23 digits after the decimal point, we have1.0111 0011 0011 0011 0011 001 *210 • Sign bit is positive=0. • Exponent is 7Fh + 2h = 81h • Underlining sign and fraction, we get 0100 0000 1011 1001 1001 1001 1001 1001 or 40B99999h • Note: In C, 5.8 is represented as 40B9999A, because the leftmost dropped bit was a 1. This is closer to 5.8.
Example: What is -0.5? • 0.5 = 0.1 • To 23 digits after the decimal point, we have 1.00000000000000000000000*2-1 • Sign bit is negative=1. • Exponent is 7Fh + -1h = 7Eh • Underlining sign and fraction, we get 1011 1111 0000 0000 0000 0000 0000 0000 or BF000000h
Example: CACE0000h (IEEE floating point) is what number? • 1100 1010 1100 1110 0000 0000 0000 0000 • Sign is negative • e is 95h. True exponent is 95h-7Fh=16h. • f is 100111, so significand is 1.100111 • -1.100111*216h • -110 0111 0000 0000 0000 0000 = 670000h • -(6*165+7*164) = -6,750,208.
Special Values for e and f • e=0 and f=0 represents 0. Note that there is a +0 (00000000h) and a –0 (80000000h). • e=0 and f != 0 denotes a denormalized number (discussed on next slide) • e = FFh and f = 0 denotes infinity. There are positive and negative infinities. • e = FFh and f != 0 denotes an undefined result known as NaN (Not a Number).
Denormalized numbers • To represent numbers with magnitudes too small to normalize (i.e., below 1.0 * 2-126) • Consider 1.0012 * 2-129 (~ 1.6530 * 10-35). Normalized, its exponent is too small. But it is also 0.010012 * 2-127. To store this number, the biased exponent e is set to 0 (see previous slide) and the fraction is the entire significand of the number written as a product with 2-127 (including the bit to the left of the decimal point). This gives us 00120000h:0000 0000 0001 0010 0000 0000 0000 0000
IEEE double precision • Accurate to around 15 decimal digits. • 11 bits for exponent, 52 bits for fraction. • e is true exponent + 3FFh (1023) • Denormalized numbers use 2-1023 instead of 2-127.
Numerical Analysis • Numerical Analysis is roughly “the study of doing math on a computer” • Meaning “be careful with floating point numbers, because they aren’t exact”
Adding Floating Point Numbers • Shift the number with the smaller exponent right until the exponents are the same, then add the significands and keep the (larger) exponent. • Subtraction is similar
Example: Addition • For the following examples, we’ll use a fake floating point format that has four bits for the fraction (no hidden one) and four bits for the exponent. • What is 2.5 + 9? • Can lose accuracy when adding. • Can even have (a+b)+c != a+(b+c)
Example: Subtraction • What is 9 - 2.5?
Multipying floating point numbers • The significands are multiplied, and the exponents are added. • What is 1.25 * 9? • Can lose accuracy when multiplying (round off error). But worse is the fact that multiplying multiplies your existing errors. • Or, can overflow (actually, adding can overflow also, but not as likely…) • Dividing is similar, and has similar problems.
Comparing Floating Point Numbers • Imagine you have a function f, and you want to see if f(x) is 0. In C, would if (f(x) == 0.0)work? (No! Round off error might have made your result slightly off.) • Use something like if (fabs(f(x)) < EPS)where EPS is defined to be suitably small (say, 10-10) • In general to compare x to y, use if (fabs(x-y)/fabs(y)<EPS)
The Numeric Coprocessor • Added 13 registers: 8 data registers, a control register, a status register, a tag register, an instruction pointer, and a data pointer. • Data registers hold floating point values (in 80 internal format) • Control register controls how certain degenerate cases are handled, rounding, precision, etc. • Tag register describes the state of the values in the data registers.
The Numeric Coprocessor 2 • Status register holds status flags (like the FLAGS register) • Instruction and Data pointers hold certain state information about the last FP instruction executed. • Internally, the numeric coprocessor uses extended precision (80-bit) FP format
The Data Registers • Named ST0 through ST7. Many assemblers accept ST as a synonym for ST0. [MASM calls them ST(0), ST(1), …] • Unlike the general purpose registers in the CPU, the floating point data registers are actually a stack. It takes some getting used to when registers change under you…
Floating Point Instructions • To make it easy, all FP instructions start with F • The FP instructions can only access FP registers and memory (no immediate mode, and no general purpose registers) • FP instructions usually operate on ST0 and memory or one other FP register
FLDx • FLD source; load (push) a floating point number from memory onto the stack. source can be dword (single precision), qword (double precision), or tword (extended precision) • FILD source; load (push) an integer (word, dword, or qword) from memory • FLDZ; push 0.0 • FLD1; push 1.0 • FLDL2E, FLDL2T, FLDLN2, FLDLG2, FLDPI; push log2e, log210, ln 2, log102,
FST • FST dest; Store ST0 into single or double precision memory or coprocessor register (no pop) • FSTP dest; FST with pop • FIST dest; store ST0 converted to integer into memory (word or dword) (method depends on control register) • FISTP dest; FIST with pop
FXCH and FFREE • FXCH STn; Exchange values in ST0 and STn • FFREE STn; Mark STn as unused (empty)
Addition • FADD src; ST0 += src • FADD dest, st0; dest += ST0 • FADDP dest; FADD with pop (or FADDP dest, st0) • FIADD src; ST0 += (int word or dword from mem) src
Subtraction • FSUB src; ST0 -= src • FSUBR src; ST0 = src – ST0 • FSUB dest, st0; dest – = ST0 • FSUBR dest, st0; dest = ST0 – dest • FSUBP dest; dest -= ST0, then pop • FSUBRP dest; dest = ST0 – dest, then pop • FISUB src; ST0 -= integer src • FISUBR src; ST0 = integer src - ST0
Multiplication • Exactly analogous to addition • FMUL src; ST0 *= src • FMUL TO dest; dest *= ST0 • FMULP dest; FMUL with pop • FIMUL src; ST0 *= (int word or dword from mem) src
Division • Analogous to subtraction • FDIV src; ST0 /= src • FDIVR src; ST0 = src / ST0 • FDIV dest, st0; dest /= ST0 • FDIVR dest, st0; dest = ST0 / dest • FDIVP dest; dest /= st0, then pop • FDIVRP dest; dest = st0/dest, then pop • FIDIV src; ST0 /= integer src • FIDIVR src; ST0 = integer src / ST0
Comparisons • FCOM src; compares ST0 and src (register or single or double precision from memory) • FCOMP src; FCOM with pop • FCOMPP; compare ST0 and ST1, pop both • FICOM src; compare ST0 and integer (word or dword) from memory • FICOMP src; FICOM with pop • FTST; compares ST0 and 0
Transferring flags • Compare instructions affect C0, C1, C2, C3 bits of the coprocessor status register – but the main processor conditional instructions act on CPU flags bits. • FSTSW dest; stores coprocessor status word into word in memory or AX (as if unsigned integers were compared) • SAHF; stores AH into FLAGS • LAHF; loads FLAGS into AH
Example of using FCOMP ;if (x<y) ; fld qword [x] ;ST0 = x fcomp qword [y] ;compare ST0 to y fstsw ax sahf jna else_part then_part: ;code for then part jmp short end_if else_part: ;code for else part end_if:
Other Comparisons • Pentium Pro processors and later have two floating point compare instructions that affect main processor FLAGS directly: • FCOMI src; compares ST0 and src, src must be a coprocessor register • FCOMIP src; FCOMI with pop
Example of FCOMIP ; C prototype double dmax(double d1, double d2) _dmax: ;find the larger of two doubles enter 0,0 fld qword [ebp+8] ;d2 fld qword [ebp+12] ;d1 fcomip st1 jna short d2_bigger fld qword [ebp+12] ;d1 was bigger jmp short exit d2_bigger: ;and it’s still at the top of the stack exit: leave ret
Miscellaneous instructions • FCHS; Changes sign of ST0 • FABS; ST0 = |ST0| • FSQRT; ST0 = sqrt(ST0) • FSCALE; ST0 *= 2[ST1]
Other Examples • quadt.c and quad.asm • readt.c and read.asm • fprime.c and prime2.asm