1 / 31

IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005

Reducing Hardware Complexity of Linear DSP Systems by Iteratively Eliminating Two-Term Common Subexpressions. IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005. Anup Hosangadi Ryan Kastner ECE Department, UCSB. Farzan Fallah Advanced CAD Research

fai
Download Presentation

IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reducing Hardware Complexity of Linear DSP Systems by Iteratively Eliminating Two-Term Common Subexpressions IEEE/ACM Asia South Pacific Design Automation Conference (ASP-DAC), Shanghai, 2005 Anup Hosangadi Ryan Kastner ECE Department, UCSB Farzan Fallah Advanced CAD Research Fujitsu Labs of America

  2. Outline • Introduction • Related Work • Polynomial transformation • Common Subexpression elimination • Results • Conclusions

  3. Introduction • Multiplications by constants encountered in many application areas • DSP transforms in Audio, Video, Image processing (DFT, DCT, IDCT etc..) • Filtering operations in Communication (FIR, IIR filters) • Multiple Input Multiple Output (MIMO) systems • Polynomials in Computer graphics

  4. Introduction • Multiplication is expensive in hardware • Decompose constant multiplications into shifts and additions • 13*X = (1101)2*X = X + X<<2 + X<<3 • Signed digits can reduce the number of additions/subtractions • Canonical Signed Digits (CSD) (Knuth’74) • (57)10 = (0110111)2 = (100-1001)CSD • Further reduction possible by common subexpression elimination • Upto 50% reduction (R.Hartley TCS’96)

  5. 4+, 4<< 3+, 3<< Introduction • Common subexpressions = common digit patterns • F1 = 7*X = (0111)*X = X + X<<1 + X<<2 F2 = 13*X = (1101)*X = X + X<<2 + X<<3 • D1 = X + X<<2 F1 = D1 + X<<1 F2 = D1 + X<<3 • Good for single variable: FIR filters(transposed form) • Multiple variable? (DFT, DCT etc..??) “0101” => X + X<<2

  6. Related Work • Simple Bipartite matching (Potkonjak et. al TCAD’95) • (10101) and (01101) => common pattern = “101” • (10010) and (010010) => cannot detect pattern “1001” • Recursive Shift and Add (RESANDS) (H.Nguyen et. Al, TVLSI 2000) • (10010) and (010010) => common pattern “1001” • Exhaustive enumeration of all digit patterns (Pasko et. Al. TCAD’99) • (1011) => “0011”, “1001”, “1010”, “0101”, “1011”

  7. Related Work • Extending techniques for multiple variables Y1 a11 a12 a13 X1 Y2 =a21 a22 a23 xX2 Y3 a31 a32 a33 X3 Potkonjak et. al. TCAD’95 All Distinct SijXj and CikDk Y1 Y2 Y3

  8. Related Work • Multiple Variable Common Subexpression elimination (A.Hosangadi et. al ASAP’04) • Polynomial transformation of linear systems. • Use rectangular covering methods • Cannot find subexpressions with reversed signs eg. (X1 – X2<<1) ≠ (X2<<1 – X1) • Common occurrence when signed digits are used • Rectangle covering has exponential complexity • Method to overcome these limitations ?

  9. Related Work • Algebraic methods in multi-level logic synthesis (MLLS) • Reducing literal count in a set of Boolean expressions • Factoring, decomposition: Established algebraic techniques • Typically used for thousands of variables and literals • Apply these methods to optimize linear systems? D1 = X1+ X2<<2 Y1 = D1 + D1<<3 + X1<<3 Y2 = D1 + X2<<2

  10. Linear systems and polynomial transformation • View linear systems as set of arithmetic expressions • Expressions consisting of +,-,<< operators • Develop methodology for extracting common subexpressions • Polynomial formulation C × X = (±X×Li) (14)10 × X = (1110)2 × X = X<<3 + X<<2 + X<<1 = XL3 + XL2 + XL1 = (100-10)CSD × X = XL4 – XL1

  11. Linear Systems and polynomial transformation • Y0 1 1 1 1 X0 Y1 =2 1 -1 -2 X1 Y2 1 -1 -1 1 X2 Y3 1 -2 2 -1 X3 • Decomposing constant multiplications H.264 Integer Transform Y0 = X0 + X1 + X2 + X3 Y1 = X0<<1 + X1 - X2 - X3<<1 Y2 = X0 - X1 - X2 + X3 Y3 = X0 - X1<<1 + X2<<1 - X3 12+, 4<<

  12. Linear Systems and polynomial transformation • Y0 1 1 1 1 X0 Y1 =2 1 -1 -2 X1 Y2 1 -1 -1 1 X2 Y3 1 -2 2 -1 X3 • Polynomial transformation H.264 Integer Transform Y0 = X0 + X1 + X2 + X3 Y1 = X0L + X1 - X2 - X3L Y2 = X0 - X1 - X2 + X3 Y3 = X0 - X1L + X2L - X3 12+, 4<<

  13. Fx algorithm • Concurrent Decomposition and Factorization of Boolean Expressions (J.Rajski et. al TCAD’92) • Popular as Fast-Extract (Fx) algorithm • Expression f = gh + r • g = (ab + c) => Double cube divisor • g = ab => Single cube divisor • Fx algorithm for Linear systems?

  14. Two-term divisors • Obtained from every pair of terms in each expression • Divide by the minimum exponent of L • eg. F = X1 + X2L + X3L3 • { +X2L, +X3L3}: Divide by L => (X2+ X3L2) • Divisors = (X1 + X2L), (X1 + X3L3), (X2 + X3L2) • Two divisors intersect if • The terms involved are distinct • (X1 – X2L)∩ (X1 - X2L) = φ (X1 – X2L)∩ (-X1 + X2L) = φ (reversed signs allowed !!)

  15. Two-term divisors • Theorem: Multiple term common subexpression in set of expression iff non-overlapping intersection among two-term divisors • Many divisors with intersections, which one to choose? • Use greedy selection of divisor with most # of intersections • Selecting divisors changes expressions • Perform concurrent decomposition of expressions

  16. Algorithm (Step 1) • Creating set of divisors {Divisors}; {Divisors} = φ; for each expression Pi { {Dnew} = Divisors for Pi; {Divisors} = {Divisors}∩ {Dnew}; Update frequency statistics of {Divisors} ; }

  17. Algorithm (Step 2)Common Subexpression Elimination {Divisors} = Set of all 2-term divisors; while( intersections present) { Find Best_Divisor in {Divisors} ; {T} = Set of terms involved in intersection; {D} = Set of divisors involving any term in {T} ; {Divisors} = {Divisors} – {D}; Rewrite Expressions; {Dnew} = New Divisors involving new terms; {Divisors} = {Divisors}∩ {Dnew}; }

  18. Algorithm complexity • MxM constant matrix; N digits of precision Y0 1111 1111 1011 1001Y0 = X0 + X0L + ... XM-1L3+ XM-1 Y1 .. … … … … .. YM-11111 1110 0011 1010 M N O(MN) terms M => O(M2N2) divisors

  19. Algorithm (Step 1) • Creating set of divisors {Divisors}; {Divisors} = φ; for each expression Pi { {Dnew} = Divisors for Pi; {Divisors} = {Divisors}∩ {Dnew}; Update frequency statistics of {Divisors} ; } O(M2N2) distinct divisors O(M2N2) O(M3N2)

  20. Algorithm (Step 2)Common Subexpression Elimination O(M2N2) {Divisors} = Set of all 2-term divisors; while( intersections present) { Find Best_Divisor in {Divisors} ; {T} = Set of terms involved in intersection; {D} = Set of divisors involving any term in {T} ; {Divisors} = {Divisors} – {D}; Rewrite Expressions; {Dnew} = New Divisors involving new terms; {Divisors} = {Divisors}∩ {Dnew}; } O(M2N2)

  21. Algorithm • H.264 example • >> Select D0 = (X0 + X3) Y0 = X0 + X1 + X2 + X3 Y1 = X0L + X1 - X2 - X3L Y2 = X0 - X1 - X2 + X3 Y3 = X0 - X1L + X2L - X3

  22. Algorithm • H.264 example • >> Select D1 = (X1 – X2) Y0 = D0 + X1 + X2 Y1 = X0L + X1 - X2 - X3L Y2 = D0 - X1 - X2 Y3 = X0 - X1L + X2L - X3

  23. Algorithm • H.264 example • >> Select D2 = (X1 + X2) Y0 = D0 + X1 + X2 Y1 = X0L + D1 - X3L Y2 = D0 - X1 - X2 Y3 = X0 - D1L - X3

  24. Algorithm • H.264 example • >> Select D3 = (X0 – X3) Y0 = D0 + D2 Y1 = X0L + D1 -X3L Y2 = D0 - D2 Y3 = X0 - D1L - X3

  25. Final Implementation 8+, 2<< • Extracting 4 divisors D0 = X0 + X3 Y0 = D0 + D2 D1 = X1 – X2 Y1 = D1 + D3L D2 = X1 + X2 Y2 = D0 - D2 D3 = X0 - X3 Y3 = D3 – D1L Original: 12+, 4<< Rectangle Covering: 10+, 3<<

  26. Experimental Setup • Goal • Reduction in #additions/subtractions • Effect on area/latency on synthesis • Simulate designs to estimate power consumption • Transforms DCT, IDCT,DFT, DST, DHT. • 8x8 constant matrices • 16 digits precision (CSD representation) • Compare with • Potkonjak (TCAD’95) • RESANDS (Nguyen et. al TVLSI’2000) • Rectangle Covering (A.Hosangadi et.al ASAP’04)

  27. Experimental Results Run Time 0.81s 0.08s

  28. Experimental results (III)  RESANDS (IV)  Rect. Covering (V)  2-term CSE • Synthesis results (minimum latency constraints)

  29. Experimental results (III)  RESANDS (IV)  Rect. Covering (V)  2-term CSE • Power consumption

  30. Conclusions • A new technique for eliminating common subexpressions in linear systems • Fewer operations than known methods • Much faster than rectangle covering • Combine with scheduling on given resources

  31. Thank you • Questions??

More Related