1 / 20

Binary FLP Number System X=(S, E, M) 2 = ( – 1) s ( M) ( E ) S: Sign 0= (+) & 1= ( –)

Binary FLP Number Systems. Binary FLP Number System X=(S, E, M) 2 = ( – 1) s ( M) ( E ) S: Sign 0= (+) & 1= ( –) M: Mantissa or Significand, 1>M  0.5 or 2 >M  1 E: Exponent or Characteristic; biased Larger Range & Less Precision (Fixed #). IEEE Standard 754.

isla
Download Presentation

Binary FLP Number System X=(S, E, M) 2 = ( – 1) s ( M) ( E ) S: Sign 0= (+) & 1= ( –)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Binary FLP Number Systems • Binary FLP Number System • X=(S, E, M)2 = (–1)s (M) (E) • S: Sign 0= (+) & 1= (–) • M: Mantissa or Significand, • 1>M  0.5 or • 2 >M  1 • E: Exponent or Characteristic; biased • Larger Range & Less Precision (Fixed #)

  2. IEEE Standard 754 • Single-precision Format for FLP • 32-bit: e=8, f=23 mantissa • F= (–1)s (1.f) (2E127) if 254  E  1 • F= (–1)s (0.f) (2126) if E=0 • E=255 if f=0, for ; f0 for NAN. • Ranges (254  E  1) • (2223) 2254127  F+  1•21127

  3. IEEE Standard 754 (2) • Ranges (E=0) • (1223)2126 F+  223•2126 • Denormalized Number • May be excluded in Arith. Unit • Hidden bit: (1.f) for E  1 • 0: E=0 & f=0

  4. Double-Precision Format • 64 bits e=11, f=52 mantissa • F= (–1)s (1.f) (2E1023) if 2046  E  1 • Reserve values of E=0 & 2047 • Comparisons

  5. General Format • F= (–1)s (M) (Ebias) • 1 >M  1/  • =2k • Hidden bit used • F= (–1)s (0.1M)2 (2Ebias) • =2 • Zero?  E=M=0; Smallest number E=1

  6. Operations • X= (1)S1M1E1 b >Y = (1)S2M2E2 b • ADD/SUB • XY= ( (1)S1M1)  (1)S2M2(E1E2) )E1b • If 1Mnew<2  post-normalization • steps for Add/Sub: • difference d = | E1  E2| • Shift smaller one d base-  digit to the right • Add & set Enew= larger one • post-normalization & check OV/UV if necessary

  7. Operations(2) • MUL • X*Y= ( (1)S1M1) * (1)S2M2)E1+E2b b • Enew=E1+E2 b • If 1/ 2  Mnew <1/   post-normalization • DIV • Check Y=0? If Yes  set NAN or  • X/Y= ( (1)S1M1) / (1)S2M2)E1  E2+b b • Enew=E1–E2+b • If 1 Mnew < post-normalization

  8. Choice • Range •  & Speed (alignment shift)

  9. Choice(2) • Max. Relative Rep. Error (MRRE) =0.5(ulp) • Max.[(M(x)  x)/x]  0.5(ulp)E/(ME)  0.5(ulp) • Ave. RRE (ARRE) = (ulp)(  1)/(4ln )

  10. Rounding • Trade-offs • Implementation Cost (machine) • Accuracy (Numerical) • Rounding • M(): Machine; x, y real • M(x)M(y) if x  y • If x M() then M(x) = x • If M(y)xM(y)+ulp then M(x)=M(y) or M(x)= M(y) +ulp

  11. Truncation (chopping) • Neglect the extra LSB digit(s) • M(x)=chop(x) • Error=(M(x)-x) • Ex. x=010.1 then M(x)=010

  12. Round-to-the-nearest • Rounding in general • M(x) =chop(x+ulp/2) • Ex. x=010.1 then ulp=(1.0)&M(x)=011

  13. Average Error • Ave. Err=  Error/2d • d: extra bits • Ave. Trunc. Err if fraction is rounded • =  / 22d =  (2d 1)/ 2d+1 • Ave. rounding Err • = 0.5/ 2d = 1/ 2d +1

  14. Average Error (2) • Want Ave. Err = 0  Round-to-nearest-even (odd)

  15. Jamming (von Neumann) Rounding • ROM Implementation • M()= X(y2y1y0.) X(x2x1x0. x1 )x2x 3 • Input= (x2x1x0. x1 ) • Output= (y2y1y0.) • (y2y1y0.)=(x2x1x0.) if x1=0 or (x2x1x0)=(111) • Otherwise (y2y1y0.)=(x2x1x0.) + ulp • In General • Input bits= c bits (include d extra bits)

  16. Jamming (von Neumann) Rounding(2) • c=3, d=1 • Ave. Err.= 0.5 (1/2)d0.5(1/2)c0.5(1/2)c • = 0.5(1/2)d0.5(1/2)c1 • 1st term= Ave Err if c >>1

  17. Guard Digits • Find the smallest # of digits required • Ex1: m=4 & No extra bit • 0.1000*2  0.1111*20=0.1 *23 • Missing information

  18. Guard Digits(2) • Ex2: m=3 & an extra bit(G) • 0.100*2  0.111*21=0.1001*20 0.101 *20 • Rounding error!

  19. Guard Digits(3) • Require two digits at least • Guard digit (G) • Round digit (R) for Round-to-the-nearest scheme • Require two digits and a sticky bit (S) • if Round-to-the-nearest-even (odd) scheme applied • S= Logic-OR all shift-out (loss) bit(s)

  20. Guard Digits(4) • Ex2: m=3 & RGS • LSB=RS+RS’L

More Related