IEEE Floating Point Revision Guide for Phase Test

Week 5 IEEE Floating PointRevision Guide for Phase Test

Mantissa Exponent Floating Point • 15900000000000000 • could be represented as 14 159 * 1014 15.9 * 1015 1.59 * 1016 A calculator might display 159 E14

Binary The value of real binary numbers… 1 0 1 . 1 0 1 101.101 = 4+1+1/2+1/8 = 4+1+.5+.125= 5.625 = 5 ⅝

Binary Fractions The value of real binary numbers… 1 0 1 . 1 0 1 101.101 = 4+1+1/2+1/8 = 4+1+.5+.125= 5.625 = 5 ⅝

IEEE Single Precision • The number will occupy 32 bits • The first bit represents the sign of the number; • 1= negative 0= positive. • The next 8 bits will specify the exponent stored in biased 127 form. • The remaining 23 bits will carry the mantissa normalised to be between 1 and 2. • i.e. 1<= mantissa < 2

Basic Conversion • Converting a decimal number to a floating point number. • 1. Take the integer part of the number and generate the binary equivalent. • 2. Take the fractional part and generate a binary fraction • 3. Then place the two parts together and normalise.

= 1102 IEEE – Example 1 • Convert 6.75 to 32 bit IEEE format. • 1. The Mantissa. The Integer first. • 6 / 2 = 3 r 0 • 3 / 2 = 1 r 1 • 1 / 2 = 0 r 1 • 2. Fraction next. • .75 * 2 = 1.5 • .5 * 2 = 1.0 • 3. put the two parts together… 110.11 • Now normalise 1.1011 * 22 = 0.112

= 0.112 IEEE – Example 1 • Convert 6.75 to 32 bit IEEE format. • 1. The Mantissa. The Integer first. • 6 / 2 = 3 r 0 • 3 / 2 = 1 r 1 • 1 / 2 = 0 r 1 • 2. Fraction next. • .75 * 2 = 1.5 • .5 * 2 = 1.0 • 3. put the two parts together… 110.11 • Now normalise 1.1011 * 22 = 1102

IEEE – Example 1 • Convert 6.75 to 32 bit IEEE format. • 1. The Mantissa. The Integer first. • 6 / 2 = 3 r 0 • 3 / 2 = 1 r 1 • 1 / 2 = 0 r 1 • 2. Fraction next. • .75 * 2 = 1.5 • .5 * 2 = 1.0 • 3. put the two parts together… 110.11 • Now normalise 1.1011 * 22 = 1102 = 0.112

IEEE Biased 127 Exponent • To generate a biased 127 exponent • Take the value of the signed exponent and add 127. • Example. • 216 then 2127+16 = 2143 and my value for the exponent would be 143 = 100011112 • So it is simply now an unsigned value ....

Possible Representations of an Exponent

Why Biased ? • The smallest exponent 00000000 • Only one exponent zero 01111111 • The highest exponent is 11111111 • To increase the exponent by one simply add 1 to the present pattern.

Back to the example • Our original example revisited…. 1.1011 * 22 • Exponent is 2+127 =129 or 10000001 in binary. • NOTE: Mantissa always ends up with a value of ‘1’ before the Dot. This is a waste of storage therefore it is implied but not actually stored. 1.1000 is stored .1000 • 6.75 in 32 bit floating point IEEE representation:- • 0 1000000110110000000000000000000 • sign(1) exponent(8) mantissa(23)

Special cases • 0 + Infinity and - infinity. • Zero is a pattern that only contains ‘0’s 00000000000000000000000000000000 • Positive Infinity is the pattern 011111111…. • Negative Infinity is the pattern 111111111….

Truncation and Rounding • Following arithmetic operations on a floating point number we may have increased the number of mantissa bits. • Since we will have a fixed storage (23 places) for the mantissa we require to limit these bits. • The simplest approach is to truncate the result prior to storage • Example 0.1101101 stored in 4 bits • stored in 4 bits => 0.1101 ( loss 0.0000101 )

Rounding • If lost digit is > ½ then add 1 to LSB • Example – in 4 bits • 0.1101101 <- 0.1101 + 0.0001 = 0.1110 ( rounded UP) • 0.1101011 <- 0.1101 ( rounded DOWN) • NOTE: • Rounding is always preferred to truncation partly because it is intrinsically more accurate , and because we end up with a FAIR error .

Other Considerations • Truncation always undervalues the result, and can lead to a systematic error situation . • Rounding has one major disadvantage since it requires up to two further arithmetic operations . • Note. When we use floating point care has to be taken when comparing the size of numbers because we are generating binary fractions of a predefined length. There is always going to be the chance of recurring numbers etc like 1/3 in decimal 0.333333333333333333333 etc..

From Floating Point Binary to Decimal Example • 1 0111101111100000100000000000000 • Sign = 1 therefore this number is a negative number. • Exponent 01111011 = 64+32+16+8+2+1 • = 123 • subtract the 127 = - 4 • Mantissa = 1.111000001 • 1.111000001 * 2- 4 • -ve 0.0001111000001 • 1/16 + 1/32 +1/64+1/128+1/8192 • or - 0.1173095703125

Floating Point Maths • Floating point addition and subtraction. • Make sure that the two numbers are of the same magnitude. Their Exponents have to be equal. • We then add or subtract the mantissas • Starting with the existing exponent re-normalise if needed.

Example • Example 1.1* 23 + 1.1 * 22 • Select the smaller number and make the mantissa smaller by moving the point whilst increasing the exponent until the exponents match. • 1.1 * 22 0.11 * 23 • Add the mantissas • Re-normalise.

Example • 1.1* 23 001.1 23 • +1.1 * 22 000.11 23 • 010.01 23 • Re normalise 010.01 * 23 • = 1.001 * 24

FP math • Floating Point Multiplication • Assume two numbers a x 2m b x 2n • Result (a x 2m ) x (b x 2n) = ( a x b ) x ( 2m+n ) • Floating Point Division • Assume two number a x 2m and b x 2n • Result (a x 2m ) / (b x 2n) = (a/b ) x 2m-n

IEEE Floating Point Revision Guide for Phase Test

IEEE Floating Point Revision Guide for Phase Test

Presentation Transcript

Floating Point

IEEE 754 Floating Point

Floating Point

IEEE Floating Point Numbers Overview

IEEE 754 Floating Point Standard

Floating Point

Floating Point

Floating Point

Floating Point

IEEE Floating-point Standards

Floating Point

Floating point

Lecture 2 IEEE 754 Floating Point

Floating Point

Floating point

Floating Point

The New IEEE-754 Standard for Floating Point Arithmetic

Floating Point

Phase Test 2 Revision

Floating Point

Floating Point