740 likes | 768 Views
ECE 645: Lecture 6. Number Representation Part 2 Little-Endian vs. Big-Endian Representations Floating Point Representations Rounding Representation of the Galois Field elements. Required Reading. Endianness, from Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Endianness
E N D
ECE 645: Lecture 6 Number Representation Part 2 Little-Endian vs. Big-Endian Representations Floating Point Representations Rounding Representation of the Galois Field elements
Required Reading Endianness, from Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Endianness Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 17, Floating-Point Representations
Little-Endian vs. Big-Endian Representation of Integers
Little-Endian vs. Big-Endian Representation A0 B1 C2 D3 E4 F5 67 8916 MSB LSB Little-Endian Big-Endian 0 LSB = 89 MSB = A0 67 B1 F5 C2 E4 D3 address D3 E4 C2 F5 B1 67 MSB = A0 LSB = 89 MAX
Little-Endian vs. Big-Endian Camps 0 LSB MSB . . . . . . address MSB LSB MAX Little-Endian Big-Endian Motorola 68xx, 680x0 Bi-Endian Intel IBM AMD Motorola Power PC Hewlett-Packard DEC VAX Silicon Graphics MIPS Sun SuperSPARC RS 232 Internet TCP/IP
Little-Endian vs. Big-Endian Origin of the terms Jonathan Swift, Gulliver’s Travels • A law requiring all citizens of Lilliput to break their soft-eggs • at the little ends only • A civil war breaking between the Little Endians and • the Big-Endians, resulting in the Big Endians taking refuge on • a nearby island, the kingdom of Blefuscu • Satire over holy wars between Protestant Church of England • and the Catholic Church of France
Little-Endian vs. Big-Endian Advantages and Disadvantages Big-Endian Little-Endian • easier to determine a sign of • the number • easier to compare two numbers • easier to divide two numbers • easier to print • easier addition and multiplication • of multiprecision numbers
Pointers (1) Big-Endian Little-Endian 0 int * iptr; 89 (* iptr) = 8967; (* iptr) = 6789; 67 F5 E4 address iptr+1 D3 C2 B1 A0 MAX
Pointers (2) Big-Endian Little-Endian 0 long int * lptr; 89 (* lptr) = 8967F5E4; (* lptr) = E4F56789; 67 F5 E4 address D3 lptr + 1 C2 B1 A0 MAX
The ANSI/IEEE standard floating-point number representation formats Originally IEEE 754-1985. Superseded by IEEE 754-2008 Standard.
Table 17.1 Some features of the ANSI/IEEE standard floatingpoint number representation formats
1.f 2e f = 0: Representation of f 0: Representation of NaNs f = 0: Representation of 0 f 0: Representation of denormals, 0.f 2–126 Exponent Encoding Exponent encoding in 8 bits for the single/short (32-bit) ANSI/IEEE format 0 1 126 127 128 254 255 Decimal code 00 01 7E 7F 80 FE FF Hex code Exponent value –126 –1 0 +1 +127
New IEEE 754-2008 Standard Basic Formats
New IEEE 754-2008 Standard Binary Interchange Formats
Exact result 1 + 2-1 + 2-23 + 2-24 Requirements for Arithmetic Results of the 4 basic arithmetic operations (+, -, , ) as well as square-rooting must match those obtained if all intermediate computations were infinitely precise That is, a floating-point arithmetic operation should introduce no more imprecision than the error attributable to the final rounding of a result that has no exact representation (this is the best possible) Example: (1 + 2-1) (1 + 2-23 ) Rounded result 1 + 2-1 + 2-22 Error = ½ ulp
Rounding Modes The IEEE 754-2008 standard includes five rounding modes: Default: Round to nearest, ties to even (rtne) Optional: Round to nearest, ties away from 0 (rtna) Round toward zero (inward) Round toward + (upward) Round toward – (downward)
Required Reading Parhami, Chapter 17.5, Rounding schemes Rounding Algorithms 101 http://www.diycalculator.com/popup-m-round.shtml
Rounding Rounding occurs when we want to approximate a more precise number (i.e. more fractional bits L) with a less precise number (i.e. fewer fractional bits L') Example 1: old: 000110.11010001 (K=6, L=8) new: 000110.11 (K'=6, L'=2) Example 2: old: 000110.11010001 (K=6, L=8) new: 000111. (K'=6, L'=0) The following pages show rounding from L>0 fractional bits to L'=0 bits, but the mathematics hold true for any L' < L Usually, keep the number of integral bits the same K'=K
Whole part Fractional part xk–1xk–2 . . . x1x0.x–1x–2 . . . x–lyk–1yk–2 . . . y1y0 Round Rounding Equation • y = round(x)
Rounding Techniques • There are different rounding techniques: • 1) truncation • results in round towards zero in signed magnitude • results in round towards -∞ in two's complement • 2) round to nearest number • 3) round to nearest even number (or odd number) • 4) round towards +∞ • Other rounding techniques • 5) jamming or von Neumann • 6) ROM rounding • Each of these techniques will differ in their error depending on representation of numbers i.e. signed magnitude versus two's complement • Error = round(x) – x
xk–1xk–2 . . . x1x0.x–1x–2 . . . x–lxk–1xk–2 . . . x1x0 trunc ulp 1) Truncation • Truncation in signed-magnitude results in a number chop(x) that is always of smaller magnitude than x. This is called round towards zero or inward rounding • 011.10 (3.5)10 011 (3)10 • Error = -0.5 • 111.10 (-3.5)10 111 (-3)10 • Error = +0.5 • Truncation in two's complement results in a number chop(x) that is always smaller than x. This is called round towards -∞ or downward-directed rounding • 011.10 (3.5)10 011 (3)10 • Error = -0.5 • 100.01 (-3.5)10 100 (-4)10 • Error = -0.5 The simplest possible rounding scheme: chopping or truncation
x chop( ) 4 4 3 3 2 2 1 1 x x – 4 – 3 – 2 – 1 1 2 3 4 – 4 – 3 – 2 – 1 1 2 3 4 – 1 – 1 – 2 – 2 – 3 – 3 – 4 – 4 Truncation Function Graph: chop(x) x chop( ) Fig. 17.5 Truncation or chopping of a signed-magnitude number (same as round toward 0). Fig. 17.6 Truncation or chopping of a 2’s-complement number (same as round to -∞).
Bias in two's complement truncation • Assuming all combinations of positive and negative values of x equally possible, average error is -0.375 • In general, average error = -(2-L'-2-L )/2, where L' = new number of fractional bits
Implementation truncation in hardware • Easy, just ignore (i.e. truncate) the fractional digits from L to L'+1 xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L = yk-1 yk-2 .. y1 y0. ignore (i.e. truncate the rest)
2) Round to nearest number • Rounding to nearest number what we normally think of when say round • 010.01 (2.25)10 010 (2)10 • Error = -0.25 • 010.11 (2.75)10 011 (3)10 • Error = +0.25 • 010.00 (2.00)10 010 (2)10 • Error = +0.00 • 010.10 (2.5)10 011 (3)10 • Error = +0.5 [round-half-up (arithmetic rounding)] • 010.10 (2.5)10 010 (2)10 • Error = -0.5 [round-half-down]
Round-half-up: dealing with negative numbers • Rounding to nearest number what we normally think of when say round • 101.11 (-2.25)10 110 (-2)10 • Error = +0.25 • 101.01 (-2.75)10 101 (-3)10 • Error = -0.25 • 110.00 (-2.00)10 110 (-2)10 • Error = +0.00 • 101.10 (-2.5)10 110 (-2)10 • Error = +0.5 [asymmetric implementation] • 101.10 (-2.5)10 101 (-3)10 • Error = -0.5 [symmetric implementation]
Round to Nearest Function Graph: rtn(x)Round-half-up version Symmetric implementation Asymmetric implementation
Bias in two's complement round to nearestRound-half-up asymmetric implementation • Assuming all combinations of positive and negative values of x equally possible, average error is +0.125 • Smaller average error than truncation, but still not symmetric error • We have a problem with the midway value, i.e. exactly at 2.5 or -2.5 leads to positive error bias always • Also have the problem that you can get overflow if only allocate K' = K integral bits • Example: rtn(011.10) overflow • This overflow only occurs on positive numbers near the maximum positive value, not on negative numbers
Implementing round to nearest (rtn) in hardware Round-half-up asymmetric implementation • Two methods • Method 1: Add '1' in position one digit right of new LSB (i.e. digit L'+1) and keep only L' fractional bits xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L + 1 = yk-1 yk-2 .. y1 y0. y-1 • Method 2: Add the value of the digit one position to right of new LSB (i.e. digit L'+1) into the new LSB digit (i.e. digit L) and keep only L' fractional bits xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L + x-1 yk-1 yk-2 .. y1 y0. ignore (i.e. truncate the rest) ignore (i.e truncate the rest)
Fig. 17.9 R* rounding or rounding to the nearest odd number. Round to Nearest Even Function Graph: rtne(x) • To solve the problem with the midway value we implement round to nearest-even number (or can round to nearest odd number) Fig. 17.8 Rounding to the nearest even number.
Bias in two's complement round to nearest even (rtne) • average error is now 0 (ignoring the overflow) • cost: more hardware
4) Rounding towards infinity • We may need computation errors to be in a known direction • Example: in computing upper bounds, larger results are acceptable, but results that are smaller than correct values could invalidate upper bound • Use upward-directed rounding (round toward +∞) • up(x) always larger than or equal to x • Similarly for lower bounds, use downward-directed rounding (round toward -∞) • down(x) always smaller than or equal to x • We have already seen that round toward -∞ in two's complement can be implemented by truncation
Rounding Toward Infinity Function Graph: up(x) and down(x) up(x) down(x) down(x) can be implemented by chop(x) intwo's complement
x inward( ) 4 3 2 1 x – 4 – 3 – 2 – 1 1 2 3 4 – 1 – 2 – 3 – 4 Two's Complement Round to Zero • Two's complement round to zero (inward rounding) also exists
Other Methods • Note that in two's complement round to nearest (rtn) involves an addition which may have a carry propagation from LSB to MSB • Rounding may take as long as an adder takes • Can break the adder chain using the following two techniques: • Jamming or von Neumann • ROM-based
5) Jamming or von Neumann Chop and force the LSB of the result to 1 Simplicity of chopping, with the near-symmetry or ordinary rounding Max error is comparable to chopping (double that of rounding)
xk–1 . . . x4x3x2x1x0.x–1x–2 . . . x–lxk–1 . . . x4y3y2y1y0 ROM ROM address ROM data 6) ROM Rounding Fig. 17.11 ROM rounding with an 8 2 table. Example: Rounding with a 32 4 table Rounding result is the same as that of the round to nearest scheme in 31 of the 32 possible cases, but a larger error is introduced when x3 = x2 = x1 = x0 = x–1 = 1