1 / 39

Floating Point

Learn about representing non-integers using floating point systems in computer organization. Understand scientific notations, binary conversion, IEEE 754 standard, and handling special values.

bonnieg
Download Presentation

Floating Point

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 252 Computer Organization Ellen Walker, Hiram College Floating Point

  2. Representing Non-Integers • Often represented in decimal format • Some require infinite digits to represent exactly • With a fixed number of digits (or bits), many numbers are approximated • Precision is a measure of the degree of approximation

  3. Scientific Notation (Decimal) • Format: m.mmmm x 10^eeeee • Normalized = exactly 1 digit before decimal point • Mantissa (m) represents the significant digits • Precision limited by number of digits in mantissa • Exponent (e) represents the magnitude • Magnitude limited by number of digits in exponent • Exponent < 0 for numbers between 0 and 1

  4. Scientific Notation (Binary) • Format: 1.mmmm x 2^eeeee • Normalized = 1 before the binary point • Mantissa (m) represents the significant bits • Precision limited by number of bits in mantissa • Exponent (e) represents the magnitude • Magnitude limited by number of bits in exponent • Exponent < 0 for numbers between 0 and 1

  5. Binary Examples • 1/16 1.0 x 2^-4 (mantissa 1.0, exponent -4) • 32.5 1.000001 x 2^5 (mantissa 1.000001, exponent 5)

  6. Quick Decimal-to-Binary Conversion (Exact) • Multiply the number by a power of 2 big enough to get an integer • Convert this integer to binary • Place the binary point the appropriate number of bits (based on the power of 2 from step 1) from the right of the number

  7. Conversion Example • Convert 32.5 to binary • Multiply 32.5 by 2 (result is 65) • Convert 65 to binary (result is 1000001) • Place the decimal point (in this case 1 bit from the right) (result is 100000.1) • Convert to binary scientific notation (result is 1.000001 x 2^5)

  8. Floating Point Representation • Mantissa - m bits (unsigned) • Exponent - e bits (signed) • Sign (separate) - 1 bit • Total = 1+m+e bits • Tradeoff between precision and magnitude • Total bits fit into 1 or 2 full words

  9. Implicit First Bit • Remember the mantissa must always begin with “1.” • Therefore, we can save a bit by not actually representing the 1 explicitly. • Example: • Mantissa bits 0001 • Mantissa: 1.0001

  10. Offset Exponent • Exponent can be positive or negative, but it’s cleaner (for sorting) use an unsigned representation • Therefore, represent exponents as unsigned, but add a bias of –((2^(bits-1))-1) • Examples: 8 bit exponent • 00000001 = 1(+ -127) = -126 • 10000000 = 128 (+ -127) = 1

  11. IEEE 754 Floating Point Representation (Single) • Sign (1 bit), Exponent (8 bits), Magnitude (23 bits) • What is the largest value that can be represented? • What is the smallest positive value that can be represented? • How many “significant bits” can be represented? • Values can be sorted using integer comparison • Sign first • Exponent next (sorted as unsigned) • Magnitude last (also unsigned)

  12. Double Precision • Floating point number takes 2 words (64 bits) • Sign is 1 bit • Exponent is 11 bits (vs. 8) • Magnitude is 52 bits (vs. 23) • Last 32 bits of magnitude is in the second word

  13. Floating Point Errors • Overflow • A positive exponent becomes too large for the exponent field • Underflow • A negative exponent becomes too large for the exponent field • Rounding (not actually an error) • The result of an operation has too many significant bits for the fraction field

  14. Special Values • Infinity • Result of dividing a non-zero value by 0 • Can be positive or negative • Infinity +/- anything = Infinity • Not A Number (NaN) • Result of an invalid mathematical operation, e.g. 0/0 or Infinity-Infinity

  15. Representing Special Values in IEEE 754 • Exponent ≠0, Exponent ≠ FF • Ordinary floating point number • Exponent = 00, Fraction = 0 • Number is 0 • Exponent = 00, Fraction ≠ 0 • Number is denormalized (leading 0. Instead of 1.) • Exponent = FF, Fraction = 0 • Infinity (+ or -, depending on sign) • Exponent = FF, Fraction ≠ 0 • Not a Number (NaN)

  16. Double Precision in MIPS • Each even register can be considered a register pair for double precision • High order bit in even register • Low order bit in odd register

  17. Floating Point Arithmetic in MIPS • Add.s, add.d, sub.s, sub.d [rd] [rs] [rt] • Single and double precision addition / subtraction • rd = rs +/- rt • 32 floating point registers $f0 - $f31 • Use in pairs for double precision • Registers for add.d (etc) must be even numbers

  18. Why Separate Floating Point Registers? • Twice as many registers using the same number of instruction bits • Integer & floating point operations usually on distinct data • Increased parallelism possible • Customized hardware possible

  19. Load/ Store Floading Point Number • Lwc1 32 bit word to FP register • Swc1 FP register to 32 bit word • Ldc1 2 words to FP register pair • Sdc1 register pair to 2 words • (Note last character is the number 1)

  20. Floating Point Addition • Align the binary points (make exponents equal) • Add the revised mantissas • Normalize the sum

  21. Changing Exponents for Alignment and Normalization • To keep the number the same: • Left shift mantissa by 1 bit and decrement exponent • Right shift mantissa by one bit and increment exponent • Align by right-shifting smaller number • Normalize by • Round result to correct number of significant bits • Shift result to put 1 before binary point

  22. Addition Example • Add 1.101 x 2^4 + 1.101 x 2^5 (26+52) • Align binary points 1.101 x 2^4 = 0.1101 x 2^5 • Add mantissas 0.1101 x 2^5 1.1010 x 2^5 10.0111 x 2^5

  23. Addition Example (cont.) • Normalize: 10.0111 x 2^5 = 1.00111 x 2^6 (78) • Round to 3-bit mantissa: 1.00111 x 2^6 ~= 1.010 x 2^6 (80)

  24. Rounding • At least 1 bit beyond the last bit is needed • Rounding up could require renormalization • Example: 1.1111 -> 10.000 • For multiplication, 2 extra bits are needed in case the product’s first bit is 0 and it must be left shifted (guard, round) • For complete generality, add “sticky bit” that is set whenever additional bits to the right would be >0

  25. Round to Nearest Even • Most common rounding mode • If the actual value is halfway between two values round to an even result • Examples: • 1.0011 -> 1.010 • 1.0101 -> 1.010 • If the sticky bit is set, round up because the value isn’t really halfway between!

  26. Floating point addition

  27. Floating Point Multiplication • Calculate new exponent by adding exponents together • Multiply the significands (using shift & add) • Normalize the product • Round • Set the sign

  28. Adding Exponents • Remember that exponents are biased • Adding exponents adds 2 copies of bias! (exp1 + 127) + (exp2 + 127) = (exp1+exp2 + 254) • Therefore, subtract the bias from the sum and the result is a correctly biased value

  29. Multiplication Example • Convert 2.25 x 1.5 to binary floating point (3 bits exponent, 3 bits mantissa) • 2.25 = 10.01 * 2^0 = 1.001 * 2^1 • Exp = 100 (because bias is 3) • 2.25 = 0 100 001 • 1.5 = 1.100 * 2^0 • Exp = 011, Mantissa: 100 • 1.5 = 0 100 100

  30. 1. Add Exponents • 0 100 001 x 0 011 100 • Add Exponents (and subtract bias) 100 + 011 – 011 = 100

  31. 2. Multiply Significands • 0 100 001 x 0 011 100 • Remember to restore the leading 1 • Remember that the number of binary places doubles 1.001 1.100 ------------------------ .100100 1.001000 ---------------- 1.101100 x 2^1

  32. Finish Up • Product is 1.1011 * 2^1 • Already normalized • But, too many bits, so we need to round • Nearest even number (up) is 1.110 • Result: 0 100 110 • Value is 1.75 * 2 = 3.5

  33. Types of Errors • Overflow • Exponent too large or small for the number of bits allotted • Underflow • Negative exponent is too small to fit in the # bits • Rounding error • Mantissa has too many bits

  34. Overflow and Underflow • Addition • Overflow is possible when adding two positive or two negative numbers • Multiplication • Overflow is possible when multiplying two large absolute value numbers • Underflow is possible when multiplying two numbers very close to 0

  35. Limitations of Finite Floating Point Representations • Gap between 0 and the smallest non-zero number • Gaps between values when the last bit of the mantissa changes • Fixed number of values between 0 and 1 • Significant effects of rounding in mathematical operations

  36. Implications for Programmers • Mathematical rules are not always followed • (a / b) * b does not always equal a • (a + b) + c does not always equal a + (b + c) • Use inequality comparisons instead of directly comparing floating point numbers (with ==) • if ((x > –epsilon) && (x < epsilon)) instead of if(x==0) • Epsilon can be set based on problem or knowledge of representation (e.g. single vs. double precision)

  37. The Pentium Floating Point Bug • To speed up division, a table was used • It was assumed that 5 elements of the table would never be accessed (and the hardware was optimized to make them 0) • These table elements occasionally caused errors in bits 12 to 52 of floating point significands • (see Section 3.8 for more)

  38. A Marketing Error • July 1994 - Intel discovers the bug, decides not to halt production or recall chips • September 1994 - A professor discovers the bug, posts to Internet (after attempting to inform Intel) • November 1994 - Press articles, Intel says will affect “maybe several dozen people” • December 1994 - IBM disputes claim and halts shipment of Pentium based PCs. • Late December 1994 - Intel apologizes

  39. The “Big Picture” • Bits in memory have no inherent meaning. A given sequence can contain • An instruction • An integer • A string of characters • A floating point number • All number representations are finite • Finite arithmetic requires compromises

More Related