410 likes | 566 Views
A CAD Framework for Leakage Power Aware Synthesis of Asynchronous Circuits. Behnam Ghavami and Hossein Pedram Presented by Wei- Lun Hung. Outline. Introduction AsyncTool : Synthesis of QDI Asynchronous Circuits Statistic Performance Analyzing Transistor’s Parameters Assignment
E N D
A CAD Framework for Leakage Power Aware Synthesis of Asynchronous Circuits BehnamGhavami and HosseinPedram Presented by Wei-Lun Hung
Outline • Introduction • AsyncTool: Synthesis of QDI Asynchronous Circuits • Statistic Performance Analyzing • Transistor’s Parameters Assignment • Experimental Results • Conclusion
Introduction • The VLSI design challenges • High power consumption • Synchronization problems • Robust issues • One possible solution: Asynchronous circuit • Low power consumption • No clock skew • Low Electromagnetic Interference (EMI)
Asynchronous Circuits • Not controlled by global clock • Eliminate clock skew • Potentially faster • Low power consumption • Low EMI • Rely on exchanging handshaking • Limitations • Lack of automatic synthesis tool • Hard to evaluate performance of asynchronous circuits
Transistor’s Parameters • The Vth, Vdd and gate size are the parameters which affect the performance of circuits • Heuristically search to find a good tradeoff according to the optimization goal • The optimization of synchronous circuits • Multiple-Vth and multiple-Vdd assignment • Ex: the gates on critical paths operate at the higher Vdd or lower Vth • The optimization asynchronous circuits • Cannot compute a critical path as synchronous circuits • Depends on dynamic factors, ex: # of tokens
Asynchronous Circuit Model • Delay-insensitive (DI) • Most robust of all asynchronous circuit delay models • Makes no assumptions on the delay of wires or gates • Any transition on an input to a gate must be seen on the output • Not practical due to the heavy restrictions • Quasi delay-insensitive (QDI) • Like DI, but • Assume that the delay of the branch are equal (isochronic forks) • Use Verilog-CSP Code in this framework
AsyncTool: Synthesis of QDI Asynchronous Circuits • Use Pre-Charge logic Full-Buffer (templates) for its predefined templates • Encapsulate all isochronic forks inside • Eliminate isochronic fork constrain • 3 Parts • Arithmetic function extractor (AFE) • Ex: Addition, subtraction, comparison ... • Implements them with pre-synthesized standard templates • Decomposition • Template Synthesizer (TSYN) • one-bit operators, ex: AND, OR, XOR, … • Expander is used to convert multiple-bit expressions
Decomposition (1/2) • Decompose the original description into an equivalent collection of smaller interacting processes • Convert to dynamic single assignment form • Projection • Dynamic Single Assignment form
Decomposition (2/2) • Projection • Break the program up into a concurrent system of smaller modules
Petri-Nets • Used to model concurrency and synchronization • Represented as a bipartite graph • Defined as four-tupleN = (P, T, F, m0) • P: Set of places • T: Se qt of Transitions • F ⊆ (P × T) ∪ (T × P): Flow relation • m0: Initial marking • A Masking is a mapping M: P → N
Timed Petri-Net • A Petri-Net in which transitions or places are annotated with delays • For a cycle Ck, the cycle metric is • CM(Ck) = D(Ck)/M(Ck) • D(Ck) = ∑di, ∀i ∈ Ck • The performance of a Timed Petri-Net is dictated by the cycle time largest cycle metric • CTime = MAX[CM(Ck)], ∀Ck∈ TPN • Can be resolved by Maximum Mean-Cycle Algorithms
The Average-Case Performance Metric • For a P-TPN has only one choice with n outcomes • Convert to n TPN models • For a P-TPN has more than one choice • Recursively the following formula
Probability Model • Use the static range of the primary inputs of the circuit to determine the static range or internal signals • Independent VS dependent
Computing the Static Range (1/3) • The tagged static ranges of a variable v is shown by TSR(v), where r ∈TSR(v) is expressed as <r.ct, r.vt, r.sr> • r.ct: the conditional tag • r.vt: the variable expression tag • r.sr: the static range
Computing the Static Range (2/3) • Having the static range of the right hand side variables can compute the static range or left hand side variable by Where ° is a standar operator on data values and • is operation on static ranges
Computing the Static Range (3/3) • For a loop
Computing Choice Probabilities(1/3) • For a condition variable CV(X>Y)
Template’s Parameters Assignment • The Vth, Vdd and gate size are the parameters which affect the performance of circuits • Dual-Vdd, dual-Vth and eight sizes for each type of template • Adopt Quantum genetic algorithm
The Genetic Algorithm • A search technique used in computing to find exact or approximate solutions to optimization • Use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover • Population: abstract representations of candidate solutions • Repopulation: generate a second generation population of solutions from those selected through genetic operators • Fitness function: decide the surviving chance of individuals
The Quantum Genetic Algorithm • The circuit configuration information is encoded into qubit • A qubit may be in ‘1’ or ‘0’ state, or in any superposition of the two, represented as ⎜Ψ〉=α⎜1〉+β⎜0〉 , where ⎜α⎜2+⎜β⎜2 = 1 , give the probability that the qubit will be found in ‘0’ or ‘1’
The Quantum Genetic Algorithm • The population of mqubitindividals at generation g is denoted as Q(g) = {q1g, q2g, …, qng} , where qjis defined as
Fitness Function • Power • The leakage of a template depends on the number of transistors that re turned off under inputs • Calculate the gate leakage under each input pattern • Area • A qubit have little chance to survival if its area is larger than the area constraint • Performance
Control Parameters • Population size • For a small population, the genetic diversity may not increase for many generations • For a large population, it may increase the computing time but take fewer generation to find the best solutions • Small population of size 10 to 15 perform very well • Termination condition • The power reduction is less than 0.0005% during the last 200 generations
Conclusion • An efficient design framework for optimizing reducing total power consumption while maintaining the high performance of circuits • Use Probabilistic Timed Petri-Net model to capture the dynamic behavior of the system • The proposed assigning threshold-voltage, supply-voltage and template sizing method is based on a quantum genetic algorithm • 5X ~ 7X savings for power consumptions with 2.5% performance penalty
Comments • Not Scalable? • Have to specify the static range of the inputs of the circuits • The connection between synthesis and parameter assigning is not strong • Experimental results are questionable • Many typos