350 likes | 370 Views
This paper discusses the synthesis of speed independent circuits using decomposition techniques, aiming to reduce the cost and improve efficiency.
E N D
Synthesis of Speed Independent Circuits Based on Decomposition Tomohiro Yoneda National Institute of Informatics Tokyo Institute of Technology Hiroomi Onda Tokyo Institute of Technology Chris Myers University of Utah
Background • High-level synthesis • plays an important role to push Async. design to wide use • Major approach to high-level synthesis • Prepare basic cells that correspond to specification language constructs • Translate specifications to basic cell networks syntax-directedly with local optimizations • Very efficient • Global optimization may be difficult 2004/4/21 Async2004
Challenge • Our approach to high-level synthesis • Translate high-level spec to low-level spec (time Petri nets) • Use timed logic synthesis technique • Global optimization can be possible by • logic optimization • timing information • Cost for synthesis is very high 2004/4/21 Async2004
How to reduce the cost • Translation technique to low-level spec • guarantees that low-level spec has CSC • by adding state variables sufficiently • Idea: [Yoneda,Myers 2003] • Developing Balsa Compiler • Efficient logic synthesis technique • decomposes low-level spec w.r.t. each output • synthesizes each sub-circuit from each sub-spec Goal of this work In this paper, speed independent circuit synthesis is discussed 2004/4/21 Async2004
Decomposition based synthesis • Input • STG • 1 safe • output semi-modular • with CSC (Complete State Coding) • several more restrictions • Output • Reduced STG for each output • g-C or atomic-gate implementation is synthesizable • Feature • Only state graphs for reduced STGs are necessary • It is not necessary to explore the reachable states of the original STGs 2004/4/21 Async2004
Key issue - input set determination ack1 req1 csc req2 gC csc req1 ack1 gC req1 ack1 csc ack2 csc gC req1 ack1 2004/4/21 Async2004
Reduction Key issue - input set determination ack1 req1 csc req2 gC csc req1 ack1 gC req1 ack1 csc ack2 csc gC req1 ack1 2004/4/21 Async2004
Related works • Synthesizing each output separately • T.A. Chu, Synthesis of Self-Timed VLSI Circuits from Graph-theoretic Specification, PhD thesis, MIT,1987 • No idea for input set determination • R. Puri, J. Gu, A Modular Partitioning Approach for Asynchronous Circuit Synthesis, IEEE TCAD, 1995 • Input set determination is performed based on the state graph of the original STG • Input signals are kept, if hiding them does not increase the number of CSC conflicts • W. Vogler, R. Wollowski, Decomposition in Asynchronous Circuit Design, Tech Report, Univ. Augsburg, 2002 • STG reduction technique - net contraction - is formalized • No general idea for input set determination 2004/4/21 Async2004
Our approach Step 1: Select possible trigger signals as the initial input set Step 2: Contract the original STG by deleting signals except for the output and those in the current input set Step 3: If the reduced STG has CSC, done Step 4: Otherwise, choose appropriate signals and add them to the input set Step 5: Goto Step 2 2004/4/21 Async2004
Possible trigger signals Contraction Original STG Reduced STG contraction bisimilar translation (i.e., by W. Vogler, R. Wollowski) 2004/4/21 Async2004
Issues to be discussed • If the reduced STG has CSC, is a correct speed independent circuit synthesized from it? • How can appropriate signals be chosen without the state graph of the original STG? • How is the overhead (performance degradation of the synthesized circuit)? 2004/4/21 Async2004
Issues to be discussed • If the reduced STG has CSC, is a correct speed independent circuit synthesized from it? • How can appropriate signals be chosen without the state graph of the original STG? • How is the overhead (performance degradation of the synthesized circuit)? 2004/4/21 Async2004
An example (a b c x) b+ 0100 0000 ES(x+) a+/1 c+ 0110 110R a+/1 c+ x+ a-/2 111R 1101 x+ c+ 1111 1000 a+/2 a-/1 c- 011F 0000 1010 x- c- a+/2 0110 0010 b- ES(x-) 2004/4/21 Async2004
If a is deleted (a b c x) CD(ES(x+)) b+ 0100 0000 a+/1 c+ 0110 110R a+/1 c+ x+ a-/2 111R 1101 x+ c+ 1111 1000 a+/2 a-/1 c- 011F 0000 1010 x- c- a+/2 0110 0010 b- CD(ES(x-)) CD(S): Extended set of S by deleting signals 2004/4/21 Async2004
Irrelevant input set • A set D of signals is an irrelevant inputset for an output x, if • D In Out – {x} • CD(ES(x+)) – UR = ES(x+) • CD(ES(x–)) – UR = ES(x–) • In: Input signal set of the original STG • Out: Output signal set of the original STG • UR: Unreachable state set of the original STG 2004/4/21 Async2004
If a is deleted (a b c x) CD(ES(x+)) b+ 0100 0000 a+/1 c+ 0110 110R a+/1 c+ x+ a-/2 111R 1101 x+ CD(ES(x+)) – UR ES(x+) CD(ES(x–)) – UR ES(x–) c+ 1111 1000 a+/2 a-/1 c- 011F 0000 1010 {a} is not an irrelevant input set x- c- a+/2 0110 0010 b- CD(ES(x-)) If a non-irrelevant input set is deleted, the reduced STG has no CSC 2004/4/21 Async2004
If c is deleted (a b c x) CD(ES(x+)) b+ 0100 0000 a+/1 c+ 0110 110R a+/1 c+ x+ a-/2 111R 1101 x+ CD(ES(x+)) – UR = ES(x+) CD(ES(x–)) – UR = ES(x–) c+ 1111 1000 a+/2 a-/1 c- 0101 011F 0000 1010 x- c- {c} is an irrelevant input set a+/2 0110 0010 b- CD(ES(x-)) If an irrelevant input set (including no possible trigger signals) is deleted, a correct circuit is obtained from the reduced STG 2004/4/21 Async2004
Theorem 1 • For an STG G that has CSC and is output semi-modular, if a reduced STG G' obtained from G by deleting some signal set V (including no possible trigger signals) has CSC, then a correct circuit is obtained from G' If V is not an irrelevant input set, G' must not have CSC V must be an irrelevant input set A correct circuit is obtained from G' 2004/4/21 Async2004
Issues to be discussed • If the reduced STG has CSC, is a correct speed independent circuit synthesized from it? • How can appropriate signals be chosen without the state graph of the original STG? • How is the overhead (performance degradation of the synthesized circuit)? 2004/4/21 Async2004
Possible trigger signals Contraction with initial input set Original STG Reduced STG contraction 2004/4/21 Async2004
1R CSC conflict 10 Checking CSC • Constructing state graph of the reduced STG Reduced STG 00 a+/1 1R x+ 11 a-/2 a-/1 0F x- 00 a+/2 10 2004/4/21 Async2004
Guided Simulation abstracted trace original trace State graph of the original STG 00 b+ 0100 0000 a+/1 a+/1 c+ 1R 0110 noninterface transition 110R a+/1 x+ c+ x+ 111R 11 1101 x+ a-/1 c+ 1111 interface transition 0F 1000 a+/2 a-/1 x- c- 011F 0000 00 1010 x- c- a+/2 This can be obtained by simulating the original STG not requiring the state graph of the original STG a+/2 0110 0010 b- 10 2004/4/21 Async2004
Generating original trace Original STG t1 t2 noninterface transitions abstracted trace: a+ b+ t3 b+ interface transitions original trace: t2 t3 a+ b+ a+ 2004/4/21 Async2004
Analysis of original trace b+ 0100 0000 c+ noninterface signal 0110 a+/1 111R interface signal x+ 1111 1000 a+/2 a-/1 011F 0000 x- c- 0110 0010 b- Find a noninterface signal that certainly changes odd times here 2004/4/21 Async2004
Analysis of original trace b+ 0100 0000 c+ noninterface signal 0110 a+/1 111R interface signal x+ 1111 Add b to the input set 1000 a+/2 a-/1 011F 0000 x- c- Resolve this CSC conflict 0110 0010 b- 2004/4/21 Async2004
concurrent Analysis of original trace b+ 0100 0000 c+ noninterface signal 0110 a+/1 111R interface signal x+ 1111 But,c does not actually 1000 a+/2 a-/1 011F 0000 c also seems to satisfy this condition x- c- 0110 0010 b- Select a noninterface signal that certainly changes odd times here 2004/4/21 Async2004
Formalization original trace (init) • w is odd-confined by f1 : • w changes odd times in f1 • if w changes in f0, thenwe1e1 • e1 ws2 • we2 e2 we1 last w f0 e1 CSC conf. ws2 first w "" represents causality relation obtained from structure of STG f1 we2 last w e2 interface signal CSC conf. 2004/4/21 Async2004
Analysis of original trace f0 b+ 0100 0000 c+ noninterface signal 0110 a+/1 111R interface signal x+ 1111 1000 a+/2 a-/1 011F 0000 x- c- f1 0110 0010 b- b is odd-confined byf1 c is not odd-confined byf1 2004/4/21 Async2004
Theorem 2 f0 • If w(andui)satisfies the following • condition, adding w(andui) resolves • the CSC conflict inf1 • (sufficient condition) • w is odd-confined byf1 • w does not changes inl • If w changes before the first interface signal, for each oddi ui is odd-confined byhi with causality relation shown in the figure 0 110R w+ 1 110R 1 110R w- f1 w+ 1 110R interface transition h1 1 1101 • For one CSC conflict, there exist • many candidate sets of signals • ↓ • Analyze every CSC conflict • Set up the covering problem • and solve it h3 u+ 1 1100 interface transition l 1 1001 2004/4/21 Async2004
Drawback • For an STG with conflicting transitions, backtracking may be needed • Finding actually fired noninterface transitions is no longer deterministic due to deleting conflicting transitions • If many conflicting transitions exist, backtracking sometimes costs a lot • Approaches that seem practical are to • Keep all conflicting transitions even if they are not related to backtracking, or • Manually specify some of necessary conflicting transitions • Our compiler from a high-level language can automatically specify those conflicting transitions 2004/4/21 Async2004
Issues to be discussed • If the reduced STG has CSC, is a correct speed independent circuit synthesized from it? • How can appropriate signals be chosen without the state graph of the original STG? • How is the overhead (performance degradation of the synthesized circuit)? 2004/4/21 Async2004
Experimental results • Experiments • Implementation of the proposed method in C • Pentium 2.8GHz, 4GB memory • Final logic synthesis tool: petrify -gc -eqn • Benchmarks • Instruction cache controller of TITAC2 • generated from high-level spec by our compiler • large, but simple → input signal sets are small • compiler decisions are used for specifying conflicting transitions • Controllers of various filters • manually designed • medium, but complicated → input signal sets are large • all conflicting transitions are kept • Async Benchmarks • small and simple • all conflicting transitions are kept 2004/4/21 Async2004
Experimental results • CPU times, Memory usage • Benchmark1: significantly reduced • Benchmark2: reduced • Quality of synthesized circuits • Area (num. of transistors): almost no overhead 2004/4/21 Async2004
Experimental results • For Async Benchmarks • Quality : no overhead (exactly the same area) • Cost : advantageous only for largest specs CPU times (sec) Proposed Petrify 2004/4/21 Async2004
Conclusion • New algorithm to find input signal sets for decomposition based synthesis method • state graph of the original STG is not necessary • can handle larger circuits • Logic synthesis tool : NUTAS • Linux binary is downloadable from http://research.nii.ac.jp/~yoneda • Future works • extend the algorithms to support timed circuit synthesis • finish the compiler development for high-level synthesis and integrate both 2004/4/21 Async2004