470 likes | 828 Views
ECE 551 Digital System Design & Synthesis. Lecture 10 Synthesis Techniques. Lecture 10 Topics. Synthesis Process Revisited Optimization Stages in Synthesis Advanced Synthesis Strategies. Synthesis. Verilog files aren’t hardware yet! Need to “synthesize” them
E N D
ECE 551Digital System Design & Synthesis Lecture 10 Synthesis Techniques
Lecture 10 Topics • Synthesis Process Revisited • Optimization Stages in Synthesis • Advanced Synthesis Strategies
Synthesis • Verilog files aren’t hardware yet! • Need to “synthesize” them • Tool reads hardware descriptions • Figures out what hardware to make • Done automatically • Faster! • Easier! • Designers still have to understand hardware! • Avoid pre- vs. post-synthesis discrepancies • Describe EFFICIENT hardware
Useful Documentation • Fairly complete documentation is available for the Synopsys tools using: /afs/engr.wisc.edu/apps/eda/synopsys/syn_Y-2006.06-SP1/sold • See especially (through Design Compiler link) • Design Vision User Guide • Design Compiler User Guide • Design Compiler Reference Manuals • HDL Compiler (Presto Verilog) Reference Manual • HDL Compiler for Verilog Reference Manual • Use as references
HDL Compiler for Verilog Reference Manual, pg. 1-5. • HDL Compiler is called by Design Compiler and Design Vision • Why do we need to compare synthesized code to initial code?
Design CompilerUser Guide, pg. 2-17 • Design Vision is GUI for Design Compiler: use design_vision • Can also run Design Compiler directly using dc_shell • To compile using a synthesis script use dc_shell –tcl_mode –f file_name
Synthesis Script Example [1] # To run, place in the directory with all the Verilog files # and type: dc_shell -tcl_mode -f script.tcl #Analyze input files. analyze -library WORK -format verilog {./prob5.v ./prob1.v ./prob2.v} #Elaborate the design. elaborate GF_multiplier_mword -architecture verilog -library WORK #Sets clock constraint of 2ns with 50% duty cycle on signal "clock". create_clock -name "clk" -period 2 -waveform {0 1} {clock} set_dont_touch_network [ find clock clk ] #Sets the area constraint for the design set_max_area 50000
Synthesis Script Example [2] #Check and compile the design check_design > check_design.txt uniquify compile -map_effort medium #Export netlist for post-synthesis simulation into synth_netlist.v change_names -rule verilog -hierarchy write -format verilog -hierarchy -output synth_netlist.v #Generate reports report_resources > resource_report.txt report_area > area_report.txt report_timing > timing_report.txt report_constraint -all_violators > violator_report.txt report_register -level_sensitive > latch_report.txt exit
Internal Synthesizer Flow (Synopsys) Technology Mapping Multi-Level Logic Optimization Elaboration & Translation Synthesizer Policy Checking Architectural Optimization Syntax Checking Structural Representation Technology-Based Implementation Technology Library HDL Description
Initial Steps • Parsing for Syntax and Semantics Checking • Gives error messages and warnings to user • User may modify the HDL description in response • Synthesizer Policy Checking (“Check Design”) • Check for adherence to allowable language constructs • Are you using unsupported operators or constructs? Combinational feedback? Multiple drivers to non-tristate? • This is where you find out you can’t use certain Verilog constructs • This is synthesizer-dependent • Example: Advanced DesignWare library allows modulo with any value; most other tools only allow modulo with powers of 2. • Certain things common to MOST synthesizers • See HDL Compiler for Verilog Reference Manual for constructs
Elaboration & Translation • Unrolls loops, substitutes macros & parameters, computes constant functions, evaluates generate conditionals • Builds a structural representation of the design • Like a netlist, but includes larger components • Not just gate-level, may include adders, etc. • Gives additional errors or warnings to the user • Issues in initial transformation to hardware. • For example, port sizes do not match • Affects quality achieved by optimization steps • Structural representation depends on HDL quality • Poor HDL can prevent optimization
Importance of Translation • It is important for the tool to recognize the sort of logic structures you are trying to describe. • If it sees a 32-bit full adder, the tool has built-in solutions for optimizing adders • Ripple-carry, carry-save, carry look-ahead, etc. • If it just sees a Boolean function with 65 inputs, it has to work a lot harder to achieve the same results • Do you think it can invent a CLA on the fly?
Implications of Translation • Writing clear, easy to understand code not only benefits other engineers, but may give you better synthesis results. • Another reason for standard coding guidelines • Brush up on the list in “Verilog Styles That Kill” • If you have a decent synthesis tool, it’s usually better to use Verilog’s built-in arithmetic operators rather than trying to build them from gates or Boolean equations
Optimization in Synthesis • None of these are guaranteed! • Most synthesizers will make at least some attempt • Detect and eliminate redundant logic • Detect combinational feedback loops • Exploit don't-care conditions • Try to detect unused states • Detect and collapse equivalent states • Make state assignments if not made already • Synthesize multi-level logic equations subject to: • constraints on area and/or speed • available technology (library)
Optimization Process • Optimization modifies the generic netlist resulting from elaboration and translation. • Uses cells from the technology library (mapping) • Attempts to meet all specified constraints • The process is divided into major phases • All or some selection of the major phases may be performed during optimization • Phase selection can be controlled by the user • Some optimizations can be disabled (ex: set_structure) or forced (ex: set_flatten)
Optimization Phases • Major Optimization Stages • Architectural • Logic-Level • Gate-Level • Architectural optimization • High-level optimizations that occur before the design is mapped to the logic-level • Based on constraints and high-level coding style • After optimization circuit function is represented by a generic, technology-independent netlist (GTECH)
Architectural Optimization • In Synopsis, optimizations include: • Sharing common mathematical subexpressions • Sharing resources • Selecting DesignWare* implementations • Replacing the generic representation from Translation with a pre-built, optimized circuits • Reordering operators • Identifying arithmetic expressions for datapath synthesis *DesignWare is Synopsys’s library of pre-designed circuit implementations
Architectural Optimization • Examples: • Replace an adder used as a counter with incrementer count = count + 1; • Replace adder and separate subtractor with adder/subtractor if not used simultaneously if (~sub) z = a + b; else z = a – b; • Performs selection of pre-designed components (Synopsys DesignWare) • adders, multipliers, shifters, comparators, muxes, etc. • Need good code for synthesizer to do this • Designer knows more about the project than the tool does! It can only do so much on its own.
Logic/Gate-Level Optimization • Works on the generic netlist created by logic synthesis • Produces a technology-specific netlist. • In Synopsis, it consists of four stages: • Mapping • Delay optimization • Design rule fixing • Area optimization • This phase often runs in multiple iterations if constraints are not met on the first try
Logic/Gate-Level Optimization • Mapping • Generates a gate level implementation using tech library • Tries to meet timing and area goals • Delay optimization • Tries to fix delay violations from mapping phase. • Does not fix design rule violations or meet area constraints. • Design rule fixing • Tries to correct design rule violations • Inserting buffers or resizing existing cells • If necessary, violates optimization constraints • Area optimization • Tries to meet area constraints, which have lowest priority
Boolean Logic-Level Optimizations Verilog Technology Description Libraries TRANSLATION OPTIMIZATION MAPPING ENGINE ENGINE ENGINE Optimized Two-level Technology Multi-level Logic Logic Functions Implementation Functions
Logic Optimizations • Area • Number of gates fewer == smaller • Size of gates (# inputs) fewer == smaller • Delay • Number of logic levels fewer == faster • Size of gates (# inputs) fewer == faster • Note that examples that follow ignore NOT gates for gate count / levels of circuits • This is because many libraries offer gate cells with one or more inputs already inverted.
Logic Optimizations • Decomposition • Extraction • Factoring • Substitution • Elimination • You don’t have to remember the names of these • But should understand logic optimization • Different techniques targeting area vs. delay
Decomposition • Find common expressions in a single function • Reduce redundancy • Reduce area (number/size of gates) • May increase delay • More levels of logic • Define a G(x) cost function to compare expressions • G(inverter) = 0 • G(basic gate) = #inputs to the gate • Basic gates: AND, OR, NAND, NOR • Based on the concept that the size of a gate is proportional to the number of inputs
Decomposition Example • F = abc + abd + a’c’d’ + b’c’d’ • F = ab(c + d) + c’d’(a’ + b’) • F = ab(c + d) + (c + d)’(ab)’ • X = ab 1 gate, 1 level • Y = c + d 1 gate, 1 level • F = XY + X’Y’ 3 gates, 2 levels (5 gates, 3 levels total) G(Original) = 16 (four 3-input, one 4-input gates) G(Decomposed) = 10 (five 2-input gates)
Extraction • Find common sub-expressions between functions • Like decomposition, but across more than one function • Reduce redundancy • Reduce area (number/size of gates) • May increase delay if more logic levels introduced
Extraction Example • F = (a + b)cd + e 3 gates, 3 levels • G = (a + b)e’ 2 gates, 2 levels • H = cde 1 gate, 1 level • Common subexp: X = a + b, Y = cd 1 gate, 1 level (each) • F = XY + e 4 gates, 3 levels • G = Xe’ 2 gate, 2 levels • H = Ye 2 gate, 2 levels • Before: • (3) 2-input ORs, (2) 3-input ANDs, (1) 2-input AND • G(original) = 6 + 6 + 2 = 14 • After • (2) 2-input Ors, (4) 2-input ANDs • G(extracted) = 4 + 8 = 12
Factoring • Traditional two-level logic is sum-of-products • Sometimes better expressed by product-of-sums • Fewer literals => less area • May increase delay if logic equation not completely factored (becomes multi-level)
Factoring Example • Definitely good: • F = ac + ad + bc + bd 7 gates, 3 levels* • F = (a + b)(c + d) 3 gates, 2 levels • Maybe good: • F = ac + ad + e 3 gates, 2 levels (G=7) • F = a(c + d) + e 3 gates, 3 levels (G=6) • This one might improve area... • But will likely increase delay (tradeoff) *Assuming 2-input gates
Substitution • Similar to Extraction • When one function is a sub-function of another • Reduce area • Fewer gates • Can increase delay if more logic levels
Substitution Example • G = a + b 1 gate, 1 level • F = a + b + c 1 gate, 1 level • F = G + c 2 gate, 2 levels • Before: • (1) 2-input OR, (1) 3-input OR • After: • (2) 2-input ORs (better area but increased levels) With compile_ultra, the sub-expressions do not have to explicitly match, i.e. a + b would still be identified if F = b + c + a
Elimination (Flattening) • Opposite of previous optimizations • Goal is to reduce delay • Make signals travel though as few logic levels as possible • But will likely increase area • Gate replication / redundant logic • Can force/disable this step using set_flatten true / set_flatten false
Elimination Example • G = c + d 1 gate, 1 level • F = Ga + G' b 3 gates, 3 levels • G = c + d 1 gate, 1 level • F = ac + ad + bc’d’ 4 gates, 2 levels • Before: • (2) 2-input ORs, (2) 2-input ANDs • After: • (1) 2-input OR, (1) 3-input OR, (2) 2-input ANDs,(1) 3-input AND (worse area, but fewer levels)
compile_ultra Optimizations • Ultra-high mapping effort, 2-pass Compilation • Automatic hierarchical ungrouping • Ungroups small modules before mapping • Ungroups critical path based on delay • Automatic datapath extraction * • E.g. carry-save adders, sharing/unsharing • Boundary optimization • Propagates logic across hierarchical boundaries (constants, NC inputs/outputs, NOT) • Sequential inversion * • Sequential elements can have their outputs inverted
Datepath Extraction Optimizations • Uses carry-save adders where beneficial • Carry-propagate adders only when result is needed
Datapath Extraction Optimizations • Comparator sharing • A>B, A=B, A<B use a single subtractor with multiple outputs • Optimization of parallel constant multipliers • SOP to POS transformation • Operand reordering • Explores trade-offs of common sub-expression sharing and mutually exclusive resource sharing
Sharing and Unsharing • Expression sharing may be overridden later due to timing • Z1 <= A + B + C • Z2 <= A + B + D • Arrival time is A < B < D < C
Sharing and Unsharing • Mutually exclusive operations can share resources • if(SEL) Z = A + B • else Z = C + D • When would this kind of sharing be a bad idea?
Sequential Inversion • set compile_seqmap_enable_output_inversion true • Useful if the available flip-flops do not have the same asynchronous input (preset or clear) as required in the design
Register Retiming • At the HDL level, determining the optimal placement of registers is difficult and tedious at best, or just plain impossible at worst • The register retiming tool moves registers through the synthesized combinational logic network to improve timing and/or area • Equalize delay (i.e. reduce critical path delay by increasing delay in other paths) • Reduce the number of flip-flops if timing criteria are met • Usually propagate registers forward • Be aware that this may change the values of some internal signals compared to pre-synthesis.
DC Topographical Mode • When optimizing for delay, the synthesis engine is not aware of the net delays, since the place-and-route has not been accomplished • Delays can be back-annotated and synthesis repeated after place-and-route, until closure is reached • Layout-aware synthesis attempts to get faster timing closure by predicting the physical design and using that information in synthesis and optimization, particularly with respect to delay • Estimates the placement and routing • Predicts and uses net capacitances in synthesis and optimization
Further Reading • There are many more commands out there to give you greater control over the synthesis process if you want it. • See: • Synopsys Online Documentation (SOLD) • Design Compiler man pages