1 / 5

Lecture 22

Lecture 22. Review of floating point representation from last time The IEEE floating point standard (notes) Quit early because half class still not back from Thanksgiving. The IEEE floating point standard. 1985 3 key requirements

marah-avery
Download Presentation

Lecture 22

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 22 • Review of floating point representation from last time • The IEEE floating point standard (notes) • Quit early because half class still not back from Thanksgiving

  2. The IEEE floating point standard • 1985 • 3 key requirements • standardized representation format: single (32 bits), double (64 bits), extended (> 64 bits, in practice 80 bits) • correctly rounded arithmetic • provide for avoiding exceptions by generating infinities and NaNs

  3. Correctly rounded arithmetic • The arithmetic operations +,-,*,/ on 2 floats x,y must return the float that is closest to the exact answer • Possible to change rounding mode at the hardware level so that answers round up or down instead of to nearest, but this is not normally used and not easily accessible from Java

  4. Infinities and NaNs • x/0.0 is inf when x>0, -inf when x<0 and NaN when x==0 • inf/x is inf when x>=0, -inf when x<=0 • inf and –inf are very different • there are different representations for 0 and -0, but they test equal • any operation with NaN gives NaN • any comparison with NaN gives false, even x==x when x is NaN

  5. The 80-bit extended format • All Pentium PCs have 80 bit floating point registers where the arithmetic operations are executed (whereas Sun Sparc, Apple Power-PC, etc use 64 bit registers) • On a PC, in the C language, “long double” uses the same 80-bit format. • However, the Java language does not allow the use of the 80-bit format and, in Java 1.1, insisted that the operations be carried out as if they were being done in a 64-bit register • In Java 1.1, the result was that floating point was very slow on a Pentium, as it required software modification to hardware results • In Java 1.2, this was relaxed enough to make floating point fast again • The keyword strictfp is used to insist on identical results on all machines, but this makes floating point very slow

More Related