250 likes | 439 Views
1. 12.1 Rounding Modes. Rounding : the process to obtain the best possible floating-point representation for a given real value.
E N D
Rounding: the process to obtain the best possible floating-point representation for a given real value. ANSI/IEEE standard: round to floating number whose significand has an LSB of 0 (of two adjacent floating-point number, the significand of one must end in 0, and the other one in 1). This is called round-to-near-even. For example, 3.5 and 4.5 are both rounded to 4, the closet even number, based on round-to-near-even.
Other rounding methods Round inward (toward 0):choose the nearest value in the same direction as 0. Round upward (toward +∞): choose the larger of the two possible values. Round downward (toward -∞): choose the smaller of the two possible vavlues.
Example 12.1 Rounding to the nearest integer Consider the rounded even integer corresponding to a real signed-magnitude number x a rtnei(x). Plot this round-to-nearest-even-integer for x in the range [-4,4]. Repeat part a for the function rtni(x), that is, round-to-nearest-integer function, where the midway values are always rounded up
Example 12.2 Directed rounding Consider the inward-directed round corresponding to a real signed-magnitude number x as a function ritni(x). Plot this round-inward-to-nearest-integer function for x in the range [-4,4]. Repeat part a for the round-upward-to-nearest-integer rutni(x).
Figure 12.3 Two directed round-to-nearest-integer functions for x in [– 4, 4].
Five special values in ANSI/IEEE floating-point standard ±0 Biased exponent=0, significand=0 (no hidden 1) ± ∞ Biased exponent=255 (short), or 2047 (long), significand=0 NaN Biased exponent=255 (short), or 2047 (long), significand≠0 12.2 Special Values and Execeptions
Consider the addition of ±2e1s1 and ±2e2s2, where e1 > e2 (±2e1s1) +(±2e2s2)=±2e1(s1±s2/2e1-e2) 12.3 Floating-Point Addition
Multiplication of ±2e1s1 and ±2e2s2 (±2e1s1)×(±2e2s2)=±2e1+e2(s1×s2/2e1-e2) Division of ±2e1s1 and ±2e2s2 (±2e1s1)/(±2e2s2)=±2e1-e2(s1/s2) 12.4 Other Floating-point Operations
Figure 12.6 Simplified schematic of a floating-point multiply/divide unit.
12.5 Floating-Point Instructions 10 floating-point arithmetic instructions (5 different operations: add, sub, multiply, divide, negate) add.s $f0,$f8,$f10 # set $f0 to ($f8)+($f10) add.d $f0,$f8,$f10 # set $f0 $f1 to ($f8$f9)+($f10$f11) Single operands can be in any of the floating registers. Double operands must be in specified to be in even numbered registers Figure 12.7 The common floating-point instruction format for MiniMIPS and components for arithmetic instructions. The extension (ex) field distinguishes single (* = s) from double (* = d) operands.
6 format conversion instructions: integer to single/double, single to double, double to single, and single/double to integer cvt.s.w $f0,$f8 # set $f0 to single (integer $f8) cvt.d.w $f0,$f8 # set $f0 to double (integer $f8) cvt.d.s $f0,$f8 # set $f0 to double ($f8) cvt.s.d $f0,$f8 # set $f0 to single ( $f8, $f9,) cvt.w.s $f0,$f8 # set $f0 to integer ($f8) cvt.w.d $f0,$f8 # set $f0 to integer ($f8, $f9) Figure 12.8 Floating-point instructions for format conversion in MiniMIPS.
6 data transfer instructions: load/store word to/from coprocessor1, move single/double from one FP register to another, move (copy) between FP registers and CPU general registers. lwcl $f8, 40($3) # load mem[40+($s3)] into $f8 swc1 $f8, A($3) # store mem[A+($s3)] into $f8 mv.s $f0,$f8 # load $f0 with ($f8) mv.d $f0,$f8 # load $f0,$f1 with ( $f8, $f9,) mfc1 $t0,$f12 # load $t0 with ($f12) mtc1 $f8,$t4 # load $f8 with ($t4) Figure 12.9 Instructions for floating-point data movement in MiniMIPS.
2 branch and 6 comparison instructions. The FP unit has a flag that is set to T or F based on 6 comparisons (equal, less than, or less or equal for single/double data type) bc1t L # branch on FP flag true bc1f L # branch on FP flag false c.eq.* $f0, $f8 # if ($f0)=($f8), set flag to true c.lt.* $f0, $f8 # if ($f0)<($f8), set flag to true c.lw.* $f0, $f8 # if ($f0)≤($f8), set flag to true Figure 12.10 Floating-point branch and comparison instructions in MiniMIPS.
Table 12.1 The 30 MiniMIPS floating-point instructions:because the op field contains 17 for all but two of the instructions (49 for lwc1 and 50 for swc1), it is not shown.
FP arithmetic can be quite dangerous and must be used with proper care, because results of FP computations are inexact. Why? Many real numbers do not have exact binary representation within a finite word format. This is referred as representation error. Even for values that are exactly representable, FP arithmetic produces inexact results. For example, product of 2 short FP numbers will have a 48 bits significant that must be rounded to 23 bits (plus hidden 1) This is called computation error. 12.6 Result Precision and Errors
Example 12. 4 Associate law of addition does not hold in general in FP arithmetic. For example a= -25×(1.10101011) b=25× (1.10101110) c=-2-2× (1.01100101) (a+b)+c = a+(b+c) ?
Figure 12.11 Algebraically equivalent computations may yield different results with floating-point arithmetic.
Using guard digits to avoid excessive error. For example, in a 10-digit calculator, 1/3 is represented as 0.333 333 333 3, multiplying 3 results in 0.999 999 999 9, but not 1. However, in a calculator with 2 guard bits, 1/3 is represented as 0.333 333 333 333, but still displayed as 0.333 333 333 3, multiplying 3 results in 1.
Figure 12.12 Function evaluation by table lookup and linear interpolation.