300 likes | 447 Views
Logic Synthesis. Sequential Synthesis. Introduction. Design optimization from System level to layout far too complex to approach in one big step Þ divide and conquer approach with fine tuned balance between capability to apply clean mathematical modeling and abstraction
E N D
Logic Synthesis Sequential Synthesis
Introduction • Design optimization from System level to layout • far too complex to approach in one big step • Þ divide and conquer approach with fine tuned balance between • capability to apply clean mathematical modeling and abstraction • algorithmic complexity to compute solutions • loss of optimality based on hard partitioning • design and verification methodology that requires user guidance • sweet spots change over time due to: • semi-conductor technology improvements • changes of design architectures/requirements • new algorithmic solutions, etc.
Introduction • Example: traditional ASIC methodology: • RTL verification based on simulation • logic synthesis from RTL to gate level using combinational paradigm • static timing analysis • formal equivalence checking based on combinational paradigm • ATPG and scan-based testing based on combinational paradigm • standard cell place & route methodology with zero clock-skew distribution
Introduction • However: • clean boundaries between modeling levels get blurred • larger chips and shrinking device sizes require more detailed modeling • aggressive performance and power requirements • new modeling and algorithmic approaches • Example: • RTL sign-off methodology • combined approach to logic synthesis and physical design
Overview of Circuit Optimizations Optimization Space Distance from Physical Implementation Combinational Optimization Verification Challenge Necessity of Integrated Solution Clock Skew Scheduling Retiming Architectural Restructuring System-Level Optimization
Sequential Optimization Techniques • State assignment • Lots of theory, practical only for small FSMs, that too targeting 2-level control logic • Sequential don’t cares • Compute unreachable states, use them as external don’t cares for the next-state logic • State minimization • Easy for completely specified FSMs (n¢ log n algorithm) • Incompletely specified FSMs • Retiming • balancing of path delays by moving registers within circuit topology • interleaving with combinational optimization techniques
Integration in Design Flow • Optimization Space • significant more optimization freedom for improving performance, power, and area • Distance from Physical Implementation • difficult to accurately model impact on final implementation • difficult to mathematically characterize optimization space • Verification Challenge • departure from combinational comparison model would break formal equivalence checking • different simulation behavior causes acceptance problems
Retiming r3 4 5 r2 2 3 r1 r4 4 5 2 3 r’1 r4 Skew =0 Tcycle=8 Dmax=6 Dmax=0 Dmax=8 ) ( Dmin=2 Dmin=0 Dmin=3 Skew = -1 Tcycle=7 r4 r’1
Retiming + • Only setup time constraint (0 clock skew) • Simple integration with other logical (e.g. combinational) or physical optimizations • Easy combination with clock skew scheduling to obtain global optimum - • Changes combinational model of design • severe impact on verification methodology • Inaccurate delay model if applied globally • Computation of equivalent reset state required
Retiming - Architectural Restructuring r3 20 2 r2 { . . . . . . 2 2 r2 r4 r1 r3 2 r2 10 10 { { r’4 . . . . . . 2 2 r’1 r4
Retiming - Architectural Restructuring + • Smooth extension of regular retiming • Potential to alleviate global performance bottlenecks by adding sequential redundancy and pipelining - • Significant change of design structure • substantial impact on verification methodology • Flexible architectural restructuring changes I/O behavior • existing RTL specification methods not always applicable
20% 30 edges 40% 63 edges 60% 130 edges 80% 249 edges 100% 425 edges Example Design example: - 360 I/O - 2240 flip-flops - 41665 timing edges Target cycle time (norm):1.5 Worst slack:-0.079 (5%) Distribution:
Verification • Timing verification unchanged • Sequential optimizations change the next-state and output functions • traditional combinational equivalence checking not applicable • simulation runs not recognizable by designer - acceptance problems • Generic solution: • preserve retime function (mapping function) from synthesis for: • reducing sequential EC problem back to combinational case • no false positives possible!!!! • modifying simulation model to reproduce original simulation output
Optimizing Circuits by Retiming Netlist of gates and registers: Various Goals: • Reduce clock cycle time • Reduce area • Reduce number of latches Inputs Outputs
Retiming Problem • Pure combinational optimization can be suboptimal since relations across register boundaries are disregarded Solutions • Retiming: Move register(s) so that • clock cycle decreases, or number of registers decreases and • input-output behavior is preserved • RnR: Combine retiming with combinational optimization techniques • Move latches out of the way temporarily • optimize larger blocks of combinational
Circuit Representation [Leiserson, Rose and Saxe (1983)] Circuit represented as retiming graph G(V,E,d,w) • V set of gates • E set of connections • d(v) = delay of gate/vertex v, (d(v)0) • w(e) = number of registers on edge e, (w(e)0)
+ 7 Operation delay 3 + 7 Circuit Representation Example: Correlator (from Leiserson and Saxe) (simplified) 0 Host 0 0 0 2 3 3 0 (x, y) = 1 if x=y 0 otherwise Retiming Graph (Directed) a b Circuit Every cycle in Graph has at least one register i.e. no combinational loops.
7 Preliminaries For a path p : Clock cycle Path with w(p)=0 0 0 0 0 2 3 3 0 For correlator c = 13
Basic Operation • Movement of registers from input to output of a gate or vice versa • Does not affect gate functionality's • Mathematical formulation: • r: V Z, an integer vertex labeling • wr(e) = w(e) + r(v) - r(u) for edge e = (u,v) Retime by -1 Retime by 1
7 7 Basic Operation Thus in the example, r(u) = -1, r(v) = -1 results in 0 0 1 0 0 1 0 0 v u u v 1 3 3 2 3 3 0 0 • For a path p: st, wr(p) = w(p) + r(t) - r(s) • Retiming: • r: VZ, an integer vertex labeling • wr(e) =w(e) + r(v) - r(u) for edge e= (u,v) • A retiming r is legal if wr(e) 0, eE
Retiming for Minimum Clock Cycle Problem Statement: (minimum cycle time) Given G (V, E, d, w), find a legal retiming r so that is minimized Retiming: 2 important matrices • Register weight matrix • Delay matrix
7 W V0 V1 V2 V3 0 2 2 2 0 0 0 0 0 2 0 0 0 2 2 0 V0 V1 V2 V3 Retiming for minimum clock cycle W = register pathweight matrix (minimum # latches on all paths between u and v) D = path delay matrix (maximum delay on all paths between u and v) 0 0 0 v0 0 2 3 3 0 V1 V2 D V0 V1 V2 V3 V0 V1 V2 V3 0 3 6 13 13 3 6 13 10 13 3 10 7 10 13 7 c p, if d(p) then w(p) 1
Conditions for Retiming Assume that we are asked to check if a retiming exists for a clock cycle Legal retiming: wr(e) 0 for all e. Hence wr(e) = w(e) = r(v) - r(u) 0 orr (u) - r (v) w (e) For all paths p: u v such that d(p) , we require wr(p) 1 • Thus Take the least w(p) (tightest constraint) r(u)-r(v) W(u,v)-1 Note: this is independent of the path from u to v, so we just need to apply it to u, v such that D(u,v)
7 Solving the constraints • All constraints in difference-of-2-variable form • Related to shortest path problem W V0 V1 V2 V3 D V0 V1 V2 V3 Correlator: = 7 0 2 2 2 0 0 0 0 0 2 0 0 0 2 2 0 V0 V1 V2 V3 0 3 6 13 13 3 6 13 10 13 3 10 7 10 13 7 V0 V1 V2 V3 D>7: r(u)-r(v)W(u,v)-1 Legal: r(u)-r(v)w(e) 0 0 0 v0 0 2 3 3 0 v1 V2
Solving the constraints • Do shortest path on constraint graph: (O(|V||E|)) (Bellman Ford Algorithm) • A solution exists if and only if there exists no negative weighted cycle. Constraint graph D>7: r(u)-r(v)W(u,v)-1 Legal: r(u)-r(v)w(e) -1 0 -1 2 r(0) 0 r(1) 1 0 0 0 1 -1 0,-1 0 1 1 0 r(2) r(3) 0,-1 0 -1 1 A solution is r(v0) = r(v3) = 0, r(v1) = r(v2) = -1
Retiming To find the minimum cycle time, do a binary search among the entries of the D matrix (0(VElogV)) 7 W V0 V1 V2 V3 D V0 V1 V2 V3 0 0 0 v0 0 0 2 2 2 0 0 0 0 0 2 0 0 0 2 2 0 V0 V1 V2 V3 V0 V1 V2 V3 0 3 6 13 13 3 6 13 10 13 3 10 7 10 13 7 2 3 3 0 v1 V2 Retimed correlator: + + Retime Host Host Clock cycle = 3+3+7=13 Clock cycle = 7 a a b b
Retiming: 2 more algorithms 1.Relaxation based: • Repeatedly find critical path; • retime vertex at end of path by +1 (O(VElogV)) 2. Also, Mixed Integer Linear Program formulation +1 v Critical path u
Retiming for Minimum Area Goal:minimize number of registers used where av is a constant.
Minimum Registers - Formulation Minimize: Subject to:wr(e) =w(e) + r(v) - r(u) 0 • Reducible to a flow problem
Problems with Retiming • Computation of equivalent initial states • do not exist necessarily • General solution requires replication of logic for initialization • Timing models • too far away from actual implementation 1 ? ? 0