Advanced Computer Arithmetic Floating Point Arithmetic Week 3

CENG536 Computer Engineering department Çankaya University Advanced Computer ArithmeticFloating Point ArithmeticWeek 3

The problem with fixed-point representation is illustrated by the following examples: The relative representation error due to truncation is quite significant for x while it is much less severe for y. On the other hand, both x2 and y2 are unrepresentable, because their computations lead to underflow (number too small) and overflow (too large), respectively. Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

This numbers can be represented as The exponent -5 or +7 essentially indicates the direction and amount by which the radix-point must be moved to produce the corresponding fixed-point representation shown above. Hence, the designation is “floating-point numbers”. Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

A floating-point number has four components: the sign, the significand (mantissa) s, the exponent base b, and the exponent e. The exponent base is usually a power of two except for digital arithmetic, where it is 10. Floating-Point Numbers mantissa CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

A typical floating-point format. A key point to observe is that two signs are involved in a floating-point number. Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

The use of biased exponent format has virtually no effect on the speed or cost of exponent arithmetic (addition / subtraction), given small number of bits involved. It does, however, facilitate zero detection (zero can be represented with the smallest biased exponent of 0 and an all-zero significand) and magnitude comparison (we can compare normalized floating-point numbers as if they were integers). Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

The range of values in a floating-point number representation is composed of the intervals [- max, - min] and [max, min] : Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Number distribution pattern and subranges in presentations: There are three special or singular values -, 0 +. Zero is special because it can not be presented with a normalized mantissa (significand). Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Overflow occurs when a result is less then – max or greater then + max. Underflow, on the other hand, occurs for results in a range (– min, 0) or (0, min) Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

The equation for the value of a floating-point number suggests that the range [- max, max] increases if we choose a larger exponent base b. Alarger b also simplifies arithmetic operations on the exponents, since for the given range, smaller exponents must be dealt with. However, if the significand is to be kept in normalized form, effective precision decreases for larger b. In the past, machines with b = 2, 8, 16, or 256 were built. Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

The exponent sign is almost always encoded in a biased format. As for a sign of a floating-point number, alternatives to the currently dominant signed-magnitude format include the use the 1’s or 2’s complement representation. Several variations have been tried in the past, including the complementation of the significand part only and the complementation of the entire number (including the exponent part) when the number to be represented is negative. Floating-Point Numbers CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

The two representation formats in IEEE standard for binary floating-point numbers (ANSI/IEEE Std 754-1985) are depicted: The ANSI/IEEE Floating-Point Standard CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

The ANSI/IEEE Floating-Point Standard CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Standard defines extended formats that allow implementation to carry higher precisions internally to reduce the effect of accumulated errors. Two extender formats are defined: The ANSI/IEEE Floating-Point Standard CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Value = N = (-1)s 2 E-127  (1.M) The decimal number 0.7510 is to be represented in the IEEE 754 single precision format: 0.7510 = 0.112 (converted to a binary number) = 1.1  2-1(normalized a binary number) hidden The mantissa is positive so the sign S is given by S = 0 The biased exponent E is given by E = e + 127 E = - 1 + 127 = 12610 = 0111 11102 Fractional part of mantissa M = .1000…..000 (in 23 bits) Floating-Point Conversion Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

The IEEE 754 single precision representation is given by: Sign Exponent Bits Mantissa 1 bit 8 bits 23 bits Floating-Point Conversion Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

The decimal number – 2345.12510 is to be represented in the IEEE 754 single precision format: – 2345.12510= – 1001 0010 1001.0012 (converted to binary) = – 1.0010 0101 0010 012  211(normalized binary) hidden The mantissa is negative so the sign S is given by S = 1 The biased exponent E is given by E = e + 127 E = 11 + 127 = 13810 = 1000 10102 Fractional part of mantissa M = .0010 0101 0010 0100 ... 000 (in 23 bits) Floating-Point Conversion Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

The IEEE 754 single precision representation is given by: Sign Exponent Bits Mantissa 1 bit 8 bits 23 bits Floating-Point Conversion Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Basic arithmetic on floating-point numbers is conceptually simple. However, care must be taken in hardware implementation for ensuring corrections and avoiding undue loss of precision; in addition, it must be possible to handle any exceptions. Addition and subtraction are most difficult of the elementary operations for floating-point operands. Here, we deal only with addition, since subtraction can be converted to addition by flipping the sign of subtrahend. Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Consider the addition Assuming , we begin by aligning the two operand through right-shifting of the significand (mantissa) of the number with the smaller exponent. Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

If the exponent base b and the number representation radix (base) are the same, we simply shift s2 to the right by e1 – e2digits. When b =rathe shift amount, which is computed through direct subtraction of the biased exponent, is multiplied by a. In either case, this step is referred to as alignment shift, or preshift, (in contrast to normalization shift or postshift which is needed when the resulting significand s is unnormalized). Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

We then perform addition as follows Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-point multiplication is simpler then floating-point addition; it is performed by multiplying the significands and adding the exponents Postshifting may be needed, since the product s1  s2of the two significands can be unnormalized. For example, we have , leading to the possible need for a single-bit right shift. Also, the computed exponent needs adjustment if the exponents are biased or if a normalization shift is performed. Overflow/underflow is possible during multiplication if e1 and e2 have like signs. Overflow is also possible due to normalization. Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Similarly, floating-point division is performed by dividing the significands and subtracting the exponents Here, problems to be dealt with are similar to those of multiplication. The ratio of the significands may have to be normalized. For example we have and a single bit left-shift is always adequate. The computed exponent needs adjustment is the exponents are biased or if a normalizing shift is performed. Overflow / underflow is possible during division if e1 and e2 have unlike signs. Underflow due to normalization is also possible. Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

To extract the square root of a positive floating-point number, we first make its exponent even. This may require subtracting 1 from the exponent and multiplying the significand by b. We then use the following In the case of IEEE floating-point numbers, the adjusted significand will be in the range 1  s  4, which leads directly to a normalized significand for the result. Square-rooting never produced overflow or underflow. Basic Floating-Point Algorithms CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Addition Algorithm CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Addition Algorithm Flowchart CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Addition Algorithm Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Addition Algorithm Notes CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Subtraction Algorithm CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Subtraction Flowchart CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Multiplication Algorithm CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Multiplication Flowchart CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Multiplication Example CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Multiplication Notes CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Division Algorithm CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Error Rounding CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Floating-Point Error Rounding Observations CENG 536 - Spring 2012-2013 Dr. Yuriy ALYEKSYEYENKOV

Advanced Computer Arithmetic Floating Point Arithmetic Week 3

Advanced Computer Arithmetic Floating Point Arithmetic Week 3

Presentation Transcript

Binary and Floating Point Arithmetic

Decimal Floating-Point Arithmetic

Floating-Point Arithmetic

Floating Point Arithmetic

Floating Point Arithmetic

Floating Point Arithmetic Sept. 24, 1998

Set 16 FLOATING POINT ARITHMETIC

Floating Point Arithmetic

Advanced Computer Arithmetic Fundamentals of computer arithmetic in RNS Week 8

Floating-Point Arithmetic

Set 16 FLOATING POINT ARITHMETIC

FLOATING POINT ARITHMETIC

Chapter 9 Floating Point Arithmetic

Floating Point Arithmetic

Floating Point Arithmetic

Floating Point Arithmetic

Integer Arithmetic Floating Point Representation Floating Point Arithmetic

Floating Point Arithmetic Feb 17, 2000

Floating Point Arithmetic – Part I

Floating Point Arithmetic

Floating Point Arithmetic