390 likes | 407 Views
Learn about representing non-integers using floating point systems in computer organization. Understand scientific notations, binary conversion, IEEE 754 standard, and handling special values.
E N D
CPSC 252 Computer Organization Ellen Walker, Hiram College Floating Point
Representing Non-Integers • Often represented in decimal format • Some require infinite digits to represent exactly • With a fixed number of digits (or bits), many numbers are approximated • Precision is a measure of the degree of approximation
Scientific Notation (Decimal) • Format: m.mmmm x 10^eeeee • Normalized = exactly 1 digit before decimal point • Mantissa (m) represents the significant digits • Precision limited by number of digits in mantissa • Exponent (e) represents the magnitude • Magnitude limited by number of digits in exponent • Exponent < 0 for numbers between 0 and 1
Scientific Notation (Binary) • Format: 1.mmmm x 2^eeeee • Normalized = 1 before the binary point • Mantissa (m) represents the significant bits • Precision limited by number of bits in mantissa • Exponent (e) represents the magnitude • Magnitude limited by number of bits in exponent • Exponent < 0 for numbers between 0 and 1
Binary Examples • 1/16 1.0 x 2^-4 (mantissa 1.0, exponent -4) • 32.5 1.000001 x 2^5 (mantissa 1.000001, exponent 5)
Quick Decimal-to-Binary Conversion (Exact) • Multiply the number by a power of 2 big enough to get an integer • Convert this integer to binary • Place the binary point the appropriate number of bits (based on the power of 2 from step 1) from the right of the number
Conversion Example • Convert 32.5 to binary • Multiply 32.5 by 2 (result is 65) • Convert 65 to binary (result is 1000001) • Place the decimal point (in this case 1 bit from the right) (result is 100000.1) • Convert to binary scientific notation (result is 1.000001 x 2^5)
Floating Point Representation • Mantissa - m bits (unsigned) • Exponent - e bits (signed) • Sign (separate) - 1 bit • Total = 1+m+e bits • Tradeoff between precision and magnitude • Total bits fit into 1 or 2 full words
Implicit First Bit • Remember the mantissa must always begin with “1.” • Therefore, we can save a bit by not actually representing the 1 explicitly. • Example: • Mantissa bits 0001 • Mantissa: 1.0001
Offset Exponent • Exponent can be positive or negative, but it’s cleaner (for sorting) use an unsigned representation • Therefore, represent exponents as unsigned, but add a bias of –((2^(bits-1))-1) • Examples: 8 bit exponent • 00000001 = 1(+ -127) = -126 • 10000000 = 128 (+ -127) = 1
IEEE 754 Floating Point Representation (Single) • Sign (1 bit), Exponent (8 bits), Magnitude (23 bits) • What is the largest value that can be represented? • What is the smallest positive value that can be represented? • How many “significant bits” can be represented? • Values can be sorted using integer comparison • Sign first • Exponent next (sorted as unsigned) • Magnitude last (also unsigned)
Double Precision • Floating point number takes 2 words (64 bits) • Sign is 1 bit • Exponent is 11 bits (vs. 8) • Magnitude is 52 bits (vs. 23) • Last 32 bits of magnitude is in the second word
Floating Point Errors • Overflow • A positive exponent becomes too large for the exponent field • Underflow • A negative exponent becomes too large for the exponent field • Rounding (not actually an error) • The result of an operation has too many significant bits for the fraction field
Special Values • Infinity • Result of dividing a non-zero value by 0 • Can be positive or negative • Infinity +/- anything = Infinity • Not A Number (NaN) • Result of an invalid mathematical operation, e.g. 0/0 or Infinity-Infinity
Representing Special Values in IEEE 754 • Exponent ≠0, Exponent ≠ FF • Ordinary floating point number • Exponent = 00, Fraction = 0 • Number is 0 • Exponent = 00, Fraction ≠ 0 • Number is denormalized (leading 0. Instead of 1.) • Exponent = FF, Fraction = 0 • Infinity (+ or -, depending on sign) • Exponent = FF, Fraction ≠ 0 • Not a Number (NaN)
Double Precision in MIPS • Each even register can be considered a register pair for double precision • High order bit in even register • Low order bit in odd register
Floating Point Arithmetic in MIPS • Add.s, add.d, sub.s, sub.d [rd] [rs] [rt] • Single and double precision addition / subtraction • rd = rs +/- rt • 32 floating point registers $f0 - $f31 • Use in pairs for double precision • Registers for add.d (etc) must be even numbers
Why Separate Floating Point Registers? • Twice as many registers using the same number of instruction bits • Integer & floating point operations usually on distinct data • Increased parallelism possible • Customized hardware possible
Load/ Store Floading Point Number • Lwc1 32 bit word to FP register • Swc1 FP register to 32 bit word • Ldc1 2 words to FP register pair • Sdc1 register pair to 2 words • (Note last character is the number 1)
Floating Point Addition • Align the binary points (make exponents equal) • Add the revised mantissas • Normalize the sum
Changing Exponents for Alignment and Normalization • To keep the number the same: • Left shift mantissa by 1 bit and decrement exponent • Right shift mantissa by one bit and increment exponent • Align by right-shifting smaller number • Normalize by • Round result to correct number of significant bits • Shift result to put 1 before binary point
Addition Example • Add 1.101 x 2^4 + 1.101 x 2^5 (26+52) • Align binary points 1.101 x 2^4 = 0.1101 x 2^5 • Add mantissas 0.1101 x 2^5 1.1010 x 2^5 10.0111 x 2^5
Addition Example (cont.) • Normalize: 10.0111 x 2^5 = 1.00111 x 2^6 (78) • Round to 3-bit mantissa: 1.00111 x 2^6 ~= 1.010 x 2^6 (80)
Rounding • At least 1 bit beyond the last bit is needed • Rounding up could require renormalization • Example: 1.1111 -> 10.000 • For multiplication, 2 extra bits are needed in case the product’s first bit is 0 and it must be left shifted (guard, round) • For complete generality, add “sticky bit” that is set whenever additional bits to the right would be >0
Round to Nearest Even • Most common rounding mode • If the actual value is halfway between two values round to an even result • Examples: • 1.0011 -> 1.010 • 1.0101 -> 1.010 • If the sticky bit is set, round up because the value isn’t really halfway between!
Floating Point Multiplication • Calculate new exponent by adding exponents together • Multiply the significands (using shift & add) • Normalize the product • Round • Set the sign
Adding Exponents • Remember that exponents are biased • Adding exponents adds 2 copies of bias! (exp1 + 127) + (exp2 + 127) = (exp1+exp2 + 254) • Therefore, subtract the bias from the sum and the result is a correctly biased value
Multiplication Example • Convert 2.25 x 1.5 to binary floating point (3 bits exponent, 3 bits mantissa) • 2.25 = 10.01 * 2^0 = 1.001 * 2^1 • Exp = 100 (because bias is 3) • 2.25 = 0 100 001 • 1.5 = 1.100 * 2^0 • Exp = 011, Mantissa: 100 • 1.5 = 0 100 100
1. Add Exponents • 0 100 001 x 0 011 100 • Add Exponents (and subtract bias) 100 + 011 – 011 = 100
2. Multiply Significands • 0 100 001 x 0 011 100 • Remember to restore the leading 1 • Remember that the number of binary places doubles 1.001 1.100 ------------------------ .100100 1.001000 ---------------- 1.101100 x 2^1
Finish Up • Product is 1.1011 * 2^1 • Already normalized • But, too many bits, so we need to round • Nearest even number (up) is 1.110 • Result: 0 100 110 • Value is 1.75 * 2 = 3.5
Types of Errors • Overflow • Exponent too large or small for the number of bits allotted • Underflow • Negative exponent is too small to fit in the # bits • Rounding error • Mantissa has too many bits
Overflow and Underflow • Addition • Overflow is possible when adding two positive or two negative numbers • Multiplication • Overflow is possible when multiplying two large absolute value numbers • Underflow is possible when multiplying two numbers very close to 0
Limitations of Finite Floating Point Representations • Gap between 0 and the smallest non-zero number • Gaps between values when the last bit of the mantissa changes • Fixed number of values between 0 and 1 • Significant effects of rounding in mathematical operations
Implications for Programmers • Mathematical rules are not always followed • (a / b) * b does not always equal a • (a + b) + c does not always equal a + (b + c) • Use inequality comparisons instead of directly comparing floating point numbers (with ==) • if ((x > –epsilon) && (x < epsilon)) instead of if(x==0) • Epsilon can be set based on problem or knowledge of representation (e.g. single vs. double precision)
The Pentium Floating Point Bug • To speed up division, a table was used • It was assumed that 5 elements of the table would never be accessed (and the hardware was optimized to make them 0) • These table elements occasionally caused errors in bits 12 to 52 of floating point significands • (see Section 3.8 for more)
A Marketing Error • July 1994 - Intel discovers the bug, decides not to halt production or recall chips • September 1994 - A professor discovers the bug, posts to Internet (after attempting to inform Intel) • November 1994 - Press articles, Intel says will affect “maybe several dozen people” • December 1994 - IBM disputes claim and halts shipment of Pentium based PCs. • Late December 1994 - Intel apologizes
The “Big Picture” • Bits in memory have no inherent meaning. A given sequence can contain • An instruction • An integer • A string of characters • A floating point number • All number representations are finite • Finite arithmetic requires compromises