550 likes | 589 Views
Floating Point Numbers. Material on Data Representation can be found in Chapter 2 of Computer Architecture (Nicholas Carter). Fractions. Similar to what we’re used to with decimal numbers. Converting decimal to binary II. 98.61 Integer part 98 / 2 = 49 remainder 0
E N D
Floating Point Numbers Material on Data Representation can be found in Chapter 2 of Computer Architecture (Nicholas Carter)
Fractions • Similar to what we’re used to with decimal numbers
Converting decimal to binary II • 98.61 • Integer part • 98 / 2 = 49 remainder 0 • 49 / 2 = 24 remainder 1 • 24 / 2 = 12 remainder 0 • 12 / 2 = 6 remainder 0 • 6 / 2 = 3 remainder 0 • 3 / 2 = 1 remainder 1 • 1 / 2 = 0 remainder 1 • 1100010
Converting decimal to binary III • 98.61 • Fractional part • 0.61 2 = 1.22 • 0.22 2 = 0.44 • 0.44 2 = 0.88 • 0.88 2 = 1.76 • 0.76 2 = 1.52 • 0.52 2 = 1.04 • .100111
Another Example (Whole number part) • 123.456 • Integer part • 123 / 2 = 61 remainder 1 • 61 / 2 = 30 remainder 1 • 30 / 2 = 15 remainder 0 • 15 / 2 = 7 remainder 1 • 7 / 2 = 3 remainder 1 • 3 / 2 = 1 remainder 1 • 1 / 2 = 0 remainder 1 • 1111011
Enter number (in Decimal), read off binary or put into binary mode if you want to use copy/Paste
Another Example (fractional part) • 123.456 • Fractional part • 0.456 2 = 0.912 • 0.912 2 = 1.824 • 0.824 2 = 1.648 • 0.648 2 = 1.296 • 0.296 2 = 0.592 • 0.592 2 = 1.184 • 0.184 2 = 0.368 • … • .0111010…
Ctrl-C to copy the displayed number. Switch to Scientific View. Ctrl-V to paste
Divide by 2 raised to the number of digits (in this case 7, including leading zero) 1 2
Divide by 2 raised to the number of digits (in this case 7, including leading zero) 3 4
Finally hit the equal sign. In most cases it will not be exact
Other way around • Multiply fraction by 2 raised to the desired number of digits in the fractional part. For example • .456 27 = 58.368 • Throw away the fractional part and represent the whole number • 58 111010 • But note that we specified 7 digits and the result above uses only 6. Therefore we need to put in the leading 0 • 0111010
Fixed point • If one has a set number of bits reserved for representing the whole number part and another set number of bits reserved for representing the fractional part of a number, then one is said to be using fixed point representation. • The point dividing whole number from fraction has an unchanging (fixed) place in the number.
Limits of the fixed point approach • Suppose you use 4 bits for the whole number part and 4 bits for the fractional part (ignoring sign for now). • The largest number would be 1111.1111 = 15.9375 • The smallest, non-zero number would be 0000.0001 = .0625
Floating point representation • Floating point representation allows one to represent a wider range of numbers using the same number of bits. • It is like scientific notation.
Scientific notation • Used to represent very large and very small numbers. • Ex. Avogadro’s number • 6.0221367 1023 particles • 602213670000000000000000 • Ex. Fundamental charge e • 1.60217733 10-19 C • 0.000000000000000000160217733 C
Scientific notation: all of these are the same number • 12345.6789 = 1234.56789 100 • 1234.56789 10 = 1234.56789 101 • 123.456789 100 =123.456789 102 • 12.3456789 103 • 1.23456789 104 • Rule: Shift the point to the left and increment the power of ten.
Small numbers • 0.000001234 • 0.00001234 10-1 • 0.0001234 10-2 • 0.001234 10-3 • 0.01234 10-4 • 0.1234 10-5 • 1.234 10-6 • Rule: shift point to the right and decrement the power.
IEEE 754 standards • The standards for floating point numbers are known as IEEE 754. • Starting with the fixed point binary representation, shift the point and increase the power (of 2 now that we’re in binary). • Like Scientific Notation, shift so that the number has one non-zero whole number digit (not 0 hence a 1) and the remainder are fractional bits.
Floats (98.61) • SHIFT expression so it is between 1 and 2 and keep track of the number of shifts • 1100010.10011100001010001 • 1.10001010011100001010001 26 • Express the number of shifts in binary • 1.10001010011100001010001 200000110 We’re not done yet so this exponent will change.
Mantissa and Exponent and Sign • 1.10001010011100001010001 200000110 • (Significand) Mantissa • 1.10001010011100001010001 200000110 • Exponent • +1.10001010011100001010001 200000110 • The number may be negative, so there a bit (the sign bit) reserved to indicate whether the number is positive or negative
Small numbers • 0.000010101110 • 1.0101110 2-5 • The power (a.k.a. the exponent) could be negative so we have to be able to deal with that. • Floating point numbers use a procedure known as biasing to handle the negative exponent problem.
Biasing • Actually the exponent is not represented as shown previously. • There were 8 bits used to represent the exponent on the previous slide, that means there are 256 numbers that could be represented. • Since the exponent could be negative (to represent numbers less than 1), we choose roughly half of the range to be positive and half to be negative .
Biasing (Cont.) • In biasing, one does notuse 2’s complement or a sign bit. • Instead one adds a bias (equal to the magnitude of the most negative number) to the exponents and represents the result of that addition.
Biasing (Cont.) • The exponents of all 1’s is reserved for special purposes – as is the exponent of all 0’s. • Thus with 8 bits, the bias is 127 (= 27 -1 that is 2 raised to the number of bits used for the exponent minus one). • In our previous example, we had to shift 6 times to the left, corresponding to an exponent of +6. • We add that shift to the bias 127+6=133. • That is the number we put in the exponent portion: 133 10000101.
Big floats – a quick comparison • Assume we use 8 bits, 4 for the mantissa and 4 for the exponent (neglecting sign). What is the largest float? • Mantissa: 1111 Exponent 1111 • 0.9375 27 • =120 • (Compare this to the largest fixed-point number using the same amount of space 15.9375)
Small floats – a quick comparison • Assume we use 8 bits, 4 for the mantissa and 4 for the exponent (neglecting sign). What is the smallest float? • Mantissa: 1000 Exponent 0000 • 0.5 2-8 • = 0.001953125 • (Compare this to the smallest fixed-point number using the same amount of space .0625)
Mantissa Storage • 1.10001010011100001010001 200000110 • (Significand) Mantissa • Our rules have use starting with 1.something (there are a few exceptions). • The standards come from a time when storage was “expensive” – so why store a digit that is always 1? So the standard does not store the 1 – it is implied.
The pieces • One bit for a sign • Eight bits for an exponent – biased by 127 • Twenty-three digits for the mantissa – which does not include the implied 1 • +98.61 • Sign: 0 • Exponent: 1000 0101 • Mantissa: 1000 1010 0111 0000 1010 001
Adding Floats • Consider adding the following numbers expressed in scientific notation 3.456789 103 1.212121 10-2 • The first step is to re-express the number with the smaller magnitude so that it has the same exponent as the other number.
Adding Floats (Cont.) • 1.212121 10-2 • 0.1212121 10-1 • 0.01212121 100 • 0.001212121 101 • 0.0001212121 102 • 0.00001212121 103 • The number was shifted 5 times (3-(-2)).
Adding Floats (Cont.) • When the exponents are equal the mantissas can be added. 3.456789 103 0.00001212121 103 • =3.45680112121 103
Rounding • In a computer there are a finite number of bits used to represent a number. • When the smaller floating-point number is shifted to make the exponents equal, some of the less significant bits are lost. • This loss of information (precision) is known as rounding.
One more fine point about floating-point representation • As discussed so far, the mantissa (significand) always starts with a 1. • When storage was expensive, designers opted not to represent this bit, since it is always 1. • It had to be inserted for various operations on the number (adding, multiplying, etc.), but it did not have to be stored.
Still another fine point • When we assume that the mantissa must start with a 1, we lose 0. • Zero is too important a number to lose, so we interpret the mantissa of all zeros and exponent of all zeros as zero • Even though ordinarily we would assume the mantissa started with a one that we didn’t store.
Yet another fine point • In the IEEE 754 format for floats, you bias by one less (127) and reserve the exponents 00000000 and 11111111 for special purposes. • One of these special purposes is “Not a number” (NaN). • Another in “Infinity” which is the floating point version of overflow.
An example • Represent -9087.8735 as a float using 23 bits for the mantissa, 8 for the exponent and one for the sign. • The float stores 23 bits but there is an implied bit, so we will talk about 24. • Convert the whole number magnitude 9087 to binary: 10 0011 0111 1111 • That uses up 14 of the 24 bits for the mantissa (23 stored), leaving 10 for the fractional part.
An example (Cont.) • Multiply the fractional part by 210 and convert whole number part of that to binary, make sure in uses 9 bits (add leading 0’s if it doesn’t). • .8735 210 = 894.464 • 894 1101111110
An example (Cont.) • 10001101111111.1101111110 • 1.00011011111111101111110 213 • Mantissa (1)00011011111111101111110 • Exponent 13+127=140 10001100 • Sign bit 1 (because number was negative) • The actual order is sign-exponent-mantissa
Example 2 • 0.0076534 • No whole number part. Begin by using all 24 (sic) mantissa bits for the fractional part. • 0.0076534 224 = 128402.7449344 • 128402 11111010110010010 • Only uses 17 places, means that so far number starts with 7 zeros. But float mantissas are supposed to start with 1. • .000000011111010110010010 • 1.1111010110010010×2-8 • (But we need more digits for our mantissa)
Example 2 (Cont.) 24+7 • 0.0076534 231 = 16435551.3516032 • 16435551 1111 1010 1100 1001 0101 1111 • Above is mantissa • Exponent 127 – 8 = 119 01110111 • Sign bit 0 (positive number)
Reverse • 10000111101010001100100111110101 • 10000111101010001100100111110101 • Sign bit is one number is negative • Exponent 00001111 15 15-127 (unbias) -112 • Mantissa: 1.01010001100100111110101
Reverse (Cont) • 1.01010001100100111110101 × 223 / 223 • 101010001100100111110101 / 223 • 11061749 / 223
Reverse (Cont). • -1.31866323947906494140625*2^(-112)