500 likes | 621 Views
Coping With the Carry Problem. Limit Carry to Small Number of Bits Hybrid Redundant Residue Number Systems Detect the End of Propagation Rather Than Wait for Worst-case Time Asynchronous (Self-Timed) Design Speed-up Propagation Using Carry Lookahead and Other Methods
E N D
Coping With the Carry Problem • Limit Carry to Small Number of Bits • Hybrid Redundant • Residue Number Systems • Detect the End of Propagation Rather Than Wait for Worst-case Time • Asynchronous (Self-Timed) Design • Speed-up Propagation Using Carry Lookahead and Other Methods • Lookahead • Carry-skip • Ling Adder • Carry-select • Prefix Adders • Conditional Sum • Eliminate Carry Propagation Altogether • Redundant Number Systems • Signed-Digit Representations
Residue Number Systems (RNS) • Convert Arithmetic on Large Numbers to Arithmetic on Small Numbers • Significant Speedup in Some Signal Processing Algorithms • Valuable Tool for Theoretical Studies of the Limits of Fast Arithmetic
Residue Number Systems (RNS) • Integer System • Addition, Subtraction, Multiplication • Carry Free !!! • Division, Comparison, Sign Detection • Complex and Slow • Inconvenient For Fractional Representations • Generally Used For Special Purpose Applications • such as DSP Filters
Residue Number Systems (RNS) • Radix is n-tuple of Integers (mn,mn-1,...,m1) • Not a Single Base Value • Integer X Represented by n-tuple (xn,xn-1,...,x1) • qi is Largest Integer Such That: • xiis the Residue of X mod mi
RNS Example Problem Chinese Scholar, Sun Tzu wrote (1500 years ago): What number has the remainders of 2, 3 and 2 when divided by the values 7, 5 and 3 respectively? NOTATION: Sun Tzu’s Problem:
Residue (Modulo) of a Number Many Examples in Chapter 4 of Text Use:
Moduli Selection • Dynamic Range – Product of kRelatively Prime Moduli • Product, M, is Number of Different Representable Values in the RNS • DEFINITON • mi and mj are Relatively Prime if gcd(mi,mj) = 1 • EXAMPLE • mi = 4 and mj = 9, gcd(4,9) = 1 • Although Neither 4 Nor 9 is Prime, They are Relatively Prime
RNS Representation • Consider RNS(8|7|5|3) (our default RNS in this class) • 840 Distinct Representable Values • Since • Can Represent • Any Interval of 840 Consecutive Values
RNS Complementation • Given RNS Representation of X, -X is Obtained by Complementing Each Digit. Zero Digits are unchanged. EXAMPLE CHECK
Chinese Remainder Theorem • RNS can be viewed as a weighted system. EXAMPLE
RNS Encoding Efficiency • Example Requires 11 Bits mod 8 mod 7 mod 5 mod 3 • 840 Different Values Represented • 211=2048 lg2(840)=9.71411-9.714=1.3 Bits Wasted
RNS Arithmetic • Addition, Subtraction, Multiplication Can be Performed with Independent Operations on Each Digit • Following Examples Show This Process • For Subtraction, Can Complement the Number and Add Also
RNS Circuit Structure mod-8 unit mod-7 unit mod-5 unit mod-3 unit mod 8 mod 7 mod 5 mod 3
Choosing RNS Moduli • Assume we wish to represent 100,00010 Values • Standard Binary lg2(100,000)10 = 16.609610 =17 bits • RNS(13|11|7|5|3|2), Dynamic RangeM=30,03010 • Insufficient Dynamic Range • Maximum Digit Width = 4 bits, Total = 17 bits • RNS(17|13|11|7|5|3|2), Dynamic RangeM=510,51010 • Dynamic Range 5.1 Times Too Large • Maximum Digit Width = 5 bits, Total = 22 bits • Adding More Prime Moduli is Inefficient
Choosing RNS Moduli • Remove mi=5 FromRNS(17|13|11|7|5|3|2) • RNS(17|13|11|7|3|2), Dynamic RangeM=102,10210 • Still Have Relatively Prime Moduli • Maximum Digit Width = 5 bits, Total = 19 bits • 1 5-bit, 2 4-bit, 1 3-bit, 1 2-bit and 1 1-bit Modulo Units Required • Maximum Delay 5-bit Carry-Propagate • Can Combine (3,7) and (2,13) Moduli With no Speed Penalty • RNS(26|21|17|11), Dynamic RangeM=102,10210 • Maximum Digit Width = 5 bits, Total = 19 bits • 3 5-bit and 1 4-bit Modulo Units Required
Relatively Prime Values • Powers of Smaller Primes are Relatively Prime • Example • gcd(32, 22) = 1 But gcd(32,3) = 3 • Can REPLACE a Modulus With its Power • Try Use Sequence of SMALLEST Valued Moduli • RNS(22 |3), Dynamic RangeM=1210 • RNS(32 |23 |7|5), Dynamic RangeM=2,52010 • RNS(11|32 |23 |7|5), Dynamic RangeM=27,72010 • RNS(13|11|32 |23 |7|5), Dynamic RangeM=360,36010 • Maximum Digit Width = 4 bits, Total = 21 bits • Dynamic Range 3.6 times that Needed
Relatively Prime Values • RNS(13|11|32 |23 |7|5), Dynamic RangeM=360,36010 • Maximum Digit Width = 4 bits, Total = 21 bits • Dynamic Range 3.6 times that Needed • Reduce the Above by Factor of 3 • Replace 32 with 3 and Combine 3 and 5 to Get 15 • RNS(15|13|11 |23 |7), Dynamic RangeM=120,12010 • Maximum Digit Width = 4 bits, Total = 18 bits • Dynamic Range 1.2 times that Needed • Using This Strategy Can Generally Find the “Best” Moduli in Terms of Speed and Representation Efficiency
Moduli Choice for Simple Arithmetic Unit Design • Simple Units Also Lead to Speed and Cost Benefits • Modulo-ADD,SUBTRACT, MULTIPLY Units Simple to Design if mi=2ai or 2ai-1 • Power of 2 Moduli Lead to Simple Design • Standard a-bit Binary Adder • Example: Use 16 Instead of 13 • Exception in Case of Lookup Table Implementation • Power of 2a-1 Moduli Lead to Simple Design • Standard a-bit Binary Adder with End-around Carry • Referred to as “Low-cost” Moduli
RNS Low-Cost Moduli • Theorem: • A sufficient condition for 2a-1 and 2b-1 to be a relatively prime pair is that a and b are relatively prime. • Any List of Relatively Prime Numbers:ak-2> ...>a1>a0 • Can be Used as a BASIS of k-modulus RNS: RNS(2ak-2|2ak-2-1|...|2a1-1|2a0-1) • Widest Residues (Longest Carry-chain) is ak-2-bit Values
Low-Cost Moduli Example • Consider the Example From EarlierX=[0,100,000] • Choosing the Moduli From Smallest to Largest: RNS(23 | 23 -1| 22 -1) Basis:3, 2 M=16810 RNS(24 | 24 -1| 23 -1) Basis:4, 3 M=168010 RNS(25 | 25 -1 | 23 -1| 22 -1) Basis:5, 3, 2 M=20,83210 RNS(25 | 25 -1 | 24 -1| 23 -1) Basis:5, 4, 3 M=104,16010 • Can’t Include 2 and 4 in Same Basis Set, gcd(2,4)=2
Low-Cost Moduli Example • RNS(25 | 25 -1 | 24 -1| 23 -1) Basis:5, 4, 3 M=104,16010= RNS(32| 31 | 15| 7) • Requires 5+5+4+3=17 bits • Requires 2 5-bit, 1 4-bit and 1 3-bit Module • 4 RNS Digits • Efficiency = (100,001/104,160)=0.96004100% • Comparing With Unrestricted Moduli: RNS(25 | 25 -1 | 24 -1| 23 -1) 17 bits M=104,16010 5-bit Carry-ripple but Simpler Circuit, Fewer Digits RNS(15|13|11 |23 |7) 18 bits M=120,12010 4-bit Carry-ripple , 1 Extra Digit
Encoding and Decoding • Advantages of Alternative Number Systems Must Not be Outweighed By Conversions to/from the System • Encoding From Fixed Positional System to RNS Easily Accomplished Using a Table- Lookup and Modulo Addition Circuits
Encoding with Lookup Table • Conversion of Signed-Magnitude or 2’s Complement Accomplished by Converting Magnitude and Taking RNS Complement • Consider the Following Identity: • Idea is to Compute a Table of All Terms and Store in a Table for all i, j Then Add
Example Lookup Table • Use Default RNS=(8|7|5|3) • For mi=8 We Can Use 3 LSbs of Value
RNS to Mixed-Radix Form • CRT States That a Mixed-Radix Number System (MRS) is Associated with any RNS • Solves comparison, sign detection, and overflow problems • MRS is k-digit Weighted Positional Number System (mk-1|mk-2|...|m2|m1|m0) • MRS Weights are Products: (mk-2...m2m1m0, ...,m2m1m0, m1m0, m0,1) • MRS Digit Sets in Each of k Positions: [0, mk-1-1],...,[0, m2-1],[0, m1-1],[0, m0-1] • MRS Digits in Same Range as RNS Digits
RNS to MRS Example • Example Position Weights MRS (8|7|5|3) (7)(5)(3)=105, (5)(3)=15, 3, 1 • (0|3|1|0)MRS(8|7|5|3) =(0)(105)+(3)(15)+(1)(3)+(0)(1)=4810 • RNS to MRS Conversion Requires Finding the zi that Correspond to the yi in:
RNS to MRS Conversion • From MRS Definition we Have: • Easy to See that z0 = y0, Subtracting This Value From RNS and MRS Values Results in:
RNS to MRS Conversion (cont) • Next, Divide Both Representations by m0: • Thus, if We Can Divide by m0, We Have an Iterative Approach for Conversion • Dividing y' (a Multiple of m0) by m0 is SCALING Easier Than Normal RNS Division • Accomplished by Multiplying by Muliplicative Inverse of m0
Multiplicative Inverses • Multiplicative Inverse is a Value When Multiplied by Given Quantity Yields a Product of 1 • Example Multiplicative Inverses of 3 Relative tomi=8, 7, 5: • Thus, Multiplicative Inverses are 3, 5 and 2 • Can Build a Lookup Table Circuit to Store Inverses
Multiplicative Inverses Example • Divide the Number Y'= (0|6|3|0)RNS by 3 • Accomplish Through Multiplication by (3|5|2|-)RNS
RNS/MRS Conversion Example • Convert Y=(0|6|3|0)RNS to MRS z0 = y0 = 0 • Divide by 3 • Now, We Have z1=1, Subtract by 1 and Divide by 5 • This Gives z2 = 3, Subtract by 3 and Divide by 7
RNS/MRS Conversion Example • Thus Y=(0|6|3|0)RNS is (0|3|1|0)MRS • Position Weights MRS (8|7|5|3) (7)(5)(3)=105, (5)(3)=15, 3, 1 • So, Y=(0|6|3|0)RNS = (0|3|1|0)MRS = (48)10
RNS/MRS Conversion • Consider Conversion of (3|2|4|2)RNS from RNS(8|7|5|3) to Decimal • Need to Determine Values of (1|0|0|0)RNS, (0|1|0|0)RNS,(0|0|1|0)RNSand (0|0|0|1)RNS
RNS/MRS Conversion • From Definition of RNS, Positions with 0 are Multiples of RNS(8|7|5|3) and Position with 1 are <Y>mi=1
Chinese Remainder Theorem • How Did We Find w3 = (1|0|0|0)RNS= 105? • Since Digits in 7, 5, 3 Places are 0, w3 Must be a Multiple of (7)(5)(3)=105 • Must Pick the Multiple of 105 Such That its Residue With Respect to 8 is 1 • Accomplished by Multiplying 105 by its’ Multiplicative Inverse with Respect to 8 • This Process is Formalized in Chinese Remainder Theorem
Chinese Remainder Theorem THEOREM: Chinese Remainder Theorem (CRT) The magnitude of an RNS number can be obtainedfrom the CRT formula: where, by definition, Mi = M/mi and i = < Mi-1>mi is the multiplicative inverse of Mi with respect to mi.
Chinese Remainder Theorem • Can Avoid Multiplications in Conversion Process by Storing <Mi<iyi>mi>M in a Table • Example Table Given on page 64 of Textbook (and also in slide 33)
Difficult RNS Operations • Sign Test • Magnitude Comparison • Overflow Detection • Generalized Division Suffices to discuss first three in context of being able todo magnitude comparison since they are essentially same if M is such that M=N+P+1 where the values representedare in interval [-N,P].
Difficult RNS Operations • Sign Test same as Comparison with P • Overflow Detection accomplished using Signs of Operands and Results • Focus On: • Magnitude Comparison • Generalized Division
Magnitude Comparison • Could Convert to Weighted Representation Using CRT • Too Complicated – too much Overhead • Use Approximate CRT Instead • Divide CRT Equality by M by Definition
Approximate CRT • Addition of Terms is Modulo-1 • All mi-1<iyi>mi Are in [0,1) • Whole Part of Result Discarded and Fractional Part Kept • Much Easier than CRT Modulo-M Addition • mi-1<iyi>mi Can be Precomputed for all y and i • Use Table Lookup Circuit and Fractional Adder (ignore carry-outs)
Magnitude Comparison Example Use approximate CRT decoding to determine the larger of the two numbers. Reading the Values from the Tables: Thus, we conclude that:
Approximate CRT Error If Maximum Error in Approximate CRT Table is , then Approximate CRT Decoding Yields Scaled Value of RNS Number with Error No Greater than k Previous Example Table Entries Rounded to 4 Digits Maximum Error in Each Entry is = 0.00005 k = 4 Digits Error is 4 = 0.0002 0.0571 - 0.0536 = 0.0035 > 4 = 0.0002, so X > Y is Safe
Redundant RNS Representations • Do Not Have Restrict Digits in RNS to Set [0, mi -1] • If [0, i] Where i mi Then RNS is Redundant • Redundant RNS Simplifies Modular Reduction Step for Each Arithmetic Operation
Redundant RNS Example • Consider mod-13 with [0,15] • Redundant since: • Addition Using Pseudo-redundancies Can be Done with Two 4-bit Adders X Y Cout 00 Ignore SUM