700 likes | 724 Views
Explore the innovative Para-CORDIC system parallelizing CORDIC rotation for optimized performance in various applications. Learn about basic concepts, proposed methods, bottleneck solutions, comparisons, and real-world applications.
E N D
Para-CORDIC: Parallel CORDIC Rotation Algorithm and Architecture (IEEE T-CAS I, Vol. 51, No. 8, pp. 1515-1524, Aug. 2004) Tso-Bing Juang, Ph.D VLSI Design LAB, Dept. CSE, NSYSU tsobing@cse.nsysu.edu.tw
My Research – Computer Arithmetic • Applications of arithmetic components • DSP (Digital Signal Processing) • 3-D graphics • Computer communications, etc. • Topics of arithmetic [Ercegovac 2004]: • Addition/Subtraction • Multiplication/Division • Floating-point operations • CORDIC (COordinate Rotation DIgital Computer)
Academic Honors • Best thesis award, Xerox Co. Ltd, 1995 • Join Midwest Symposium of Circuits and Systems (MWSCAS) supported by NSC, 1999 • First prize award of FPGA, National Intellectual Property Contest. FPGA, 2000 • First prize award of Full Custom Design Contest, 2001 • Join Asia-Pacific Conference on Circuits and Systems (APCCAS) supported by MOE, 2002 • 2005 Marquis, Who’s who in Science and Engineering, Edition 2005-2006 • 2006 Marquis, Who’s who in the World
Outline • Basic Concept of CORDIC • Bottleneck of CORDIC Rotation • Proposed Methods • Previous Methods • Comparisons • Applications • Conclusions
What is CORDIC? • CORDIC (COordinate Rotation DIgital Computer) • Rotate vector (1,0) by f to get (cos f, sin f) • Can evaluate many arithmetic functions • Rotation realized by shift-add operations • Convergence method (iterative) • About n iterations for n-bit accuracy
Conventional CORDIC Rotation . Each iteration, x and y performs one micro-rotation based on the sign of z
Pre-computation of tan(ai) • Find ai such that tan(ai)=2-i(or, ai=tan-1(2-i)) • Possible to writeany angle f = a0 a1 … anas long as -99.7° f 99.7° (which covers –90..90)
Conventional CORDIC Rotation • Algorithm: (z is the current angle) • “At each step, try to make z approach to zero” • Initialize x0=K=0.607253,y0=0,z0= • For i = 0 n • i= 1 when zi>=0, else -1 [i.e., i=sign(zi)] • xi+1 = xi–i 2-i yi • yi+1 = yi + i 2-ixi • zi+1 = zi–i ai • End For • Result: xn+1=cos(), yn+1=sin() • Precision: n bits
Three Important Factors of CORDIC • Large additions/subtractions • Scaling factor (constant vs. non-constant) • Sequential execution
Research Topics about CORDIC • Redundant CORDIC architecture • Error analysis of CORDIC • Application of CORDIC architectures • CORDIC algorithm with non-constant scaling factors • Parallel CORDIC architecture
Conventional CORDIC Rotation (Revisited) . Sequential determination of σi based on zi
The actual speed bottleneck lies in the sequential determination of the value of Sequential CORDIC Rotation Architecture
How to parallelize? • Using each bit of input angle to determine σi • Remove the bottleneck (B: bit accuracy) • In the first m-1 iterations sequential • In other iterations parallel
For example, B=24 Our Proposed Techniques • MAR (Micro-rotation to Angle Recoding) • Obtain the combinations of tan-1 terms in each 2-i, i=1 to m-1 • BBR (Binary to Bipolar Recoding) • Obtain the polarity{-1,+1} of each binary {1,0} weight of input angle hardware free
Example (B=24) Phase 1 Three extra micro-rotation stages are required Phase 2
S(1) σ1 S(5) S(8) R(1) Para-CORDIC Architecture -2/2 R(i)
Carry-save Adder-Based Realization for Micro-Rotation Stages • A 4:2 compressor is exploited to produce the carry save form (a sum and a carry)
Evaluation of the Z Datapath • Delay is: • Area is:
Merged Rotations of the Second Half Iterations • Delay savings
Comments of Previous Proposed CORDIC Rotation – 1/4 • [Wang 1997]: IEEE T-Computers • The first m-1 iterations are sequential • Area saving
Comments of Previous Proposed CORDIC Rotation - 2/4 • [Phatak 1998]: IEEE T-Computers • Double hardware to perform clockwise/counterclockwise rotations • Area cost is high (signed-digit realization of X/Y/Z iterations)
Comments of Previous Proposed CORDIC Rotation - 3/4 • [Kwak 2000] Proc. MWSCAS • Complicated logic circuits to generate the first m-1 rotation directions
Comments of Previous Proposed CORDIC Rotation - 4/4 • [Kuhlmann 2002] : EUROSIP • Using ROM to generate the first m-1 directions
Our Proposed Para-CORDIC • The delay and the area costs of para-CORDIC is: and
ROM-based Implementations for sine/cosine generation • When x1 and y1 are constant (x1=K, y1=0, xB+1=cos(), yB+1=sin()) • Can reduce the extra micro-rotation stages
Summary • Parallel CORDIC rotation (Para-CORDIC) • Improve the original sequential execution of CORDIC rotation • Complete proof of the proposed theorems • Submission information • 2003/7/11 submitted • 2004/4/21 fully accepted • 2004/8 published • Better latency/area
Future Work • Physical implementation of Para-CORDIC • Dealing with the negative numbers when perform carry-save addition • Floating-point representation of data • Reduced micro-rotation stages in MAR • Parallel CORDIC Vectoring Methods • Must deal with two concurrent variables
Low-Error Fixed-WidthCarry-Free Multipliers Design ( To appear in IEEE T-CAS II, 2005)
Definition • An n nfixed-width multiplier • Has n most significant product bits • Needs a small compensation circuit to generate error compensation value (ECV) • ECV • Constant • Fixed • Simple implementation, large errors • Adaptive • Variable • Complex implementation, lower errors