1 / 7

Floating Point Numbers

Muddsar Jamil CS 147. Floating Point Numbers. Introduction & Representation. Provides the ability to represent very large numbers, as well as very small numbers. Example: 1 Trillion = 40 bits to left of radix pt. Retaining as much precision as needed increase calculation efficiency.

Download Presentation

Floating Point Numbers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Muddsar JamilCS 147 Floating Point Numbers

  2. Introduction & Representation • Provides the ability to represent very large numbers, as well as very small numbers.Example: 1 Trillion = 40 bits to left of radix pt. • Retaining as much precision as needed increase calculation efficiency. • A great deal of extra hardware is required in order to store/manipulate numbers with 80 bits or more.

  3. Computer Representation of Floating Point Numbers _ _ _ _ . _ _ _ _ _ _ _ _ _ _ _ _ This format makes for easier comparison: =, =/=, <=, ≥ Example: Convert (358)10 in to the above format to be used as a floating point number.Java: Float x = new Float(358.0f)‏ Sign Bit0 = +1 = - Three base 16 digits 3-bitexponent

  4. Example Continued • First step is to convert 358 from base 10 to 16. • Using Horner's method: • 358/16 = 22 --- R 6 • 22/16 = 1 --- R 6 • 35810 = 16616 • Next, convert to floating-point and Normalize • (166)10 = (166.)16 x 160 • Normalize: ( .166 )16 x 163 • The exponent is 3, but we represent it in excess 4: • 0 1 1 (+3)10 • Excess 4 + 1 0 0 (+4)10 • = 1 1 1 • 0 1 1 1 . 0 0 0 1 0 1 1 0 0 1 1 0 • + 3 1 6 6 Sign Expon. Fraction

  5. Fractional -> Fixed Point Conversion • Convert (XYZ.375)10 to Binary • First, convert XYZ using Horner's method. • Next, Convert the .37510 as following: • .375 x 2 = 0.75 • .75 x 2 = 1.5 • .5 x 2 = 1.0 • So (.375)10 = (.011)2 Most Significant BitLeast Significant Bit

  6. IEEE 754 Floating Point Standard • Created in 1985 to ensure standard representation among different systems. • Most new architectures support IEEE 754. • Two Formats: • Single Precision 1 8 23 • Double Precision =32 bits total Sign Expon. Fraction =64 bits 1 11 52 Sign Expon. Fraction

  7. IEEE 754 Representations • Can Represent (among others)‏ • Non-zero, normalized numbers • Clean zero • All 0s in exponent and fraction • Sign bit can be 0, or 1, to represent +0 or -0 • Infinity / Overflow / NaN • Exponent contains all 1s, Fraction is all 0s • Sign bit can be 0, or 1 • 0 / 0; • Sqrt(-1);

More Related