550 likes | 1.03k Views
Logic Synthesis – 3 Optimization. Ahmed Hemani Sources: Synopsys Documentation . Set Design Constraints. Develop HDL files. Design Rule Constraints set_max_transition set_max_fanout set_max_capacitance Design Optimisation Constraints Create_clock set_clock_latency
E N D
Logic Synthesis – 3Optimization Ahmed Hemani Sources: Synopsys Documentation
Set Design Constraints Develop HDL files Design Rule Constraints set_max_transition set_max_fanout set_max_capacitance Design Optimisation Constraints Create_clock set_clock_latency set_propagated_clock set_clock_uncertainty set_clock_transition set_input_delay set_output_delay set_max_area Specify Libraries Library Objects link_library target_library symbol_library synthetic_library Read Design analyze elaborate read_file Select Compile Strategy Top Down Bottom Up Define Design Environment Optimize the Design Set_operating_conditions Set_wire_load_model Set_drive Set_driving_cell Set_load Set_fanout_load Set_min_library Compile Analyze and Resolve Design Problems Check_design Report_area Report_constraint Report_timing Save the Design database write
Phases of Optimization • Algorithmic Level Optimisation • High Level Optimization • Logic Level Optimization • Flattening. • Structuring. • Gate level optimisation. • Technology Mapping. • In-place. • Translation. • Boundary. • RTL. • State Minimisation. • State Assignment. • Retiming
High Level Optimization An Example of Resource Sharing • Resource Sharing. • Implementation Selection. • Arithmetic Optimization.
HLO – Resource Sharing • Though resource in general means computational, storage and interconnect elements, here it implies only computational elements. • Two simple rules decide the amount of resources required by an RTL specification: • Each type of operator requires a unique resource type. For instance ’+’ operator requires an adder and ’>’ requires a comparator. • The maximum number of resources required for each operator type is the number of times an operator is used in the RTL specification. • As an optimisation measure, the above two rules are extended: • Some operators can be mapped to a common resource type. For instance, ’+’ and ’-’ operators can be mapped to an add-subtract unit. • Multiply with constant numbers ??? • Operators in different clock cycles can share the same resource. This is determined by analysing if there are any data flow or control flow conflicts, which is discussed later. * + - > >= < <=
Scope and Restrictions for Sharing Resources can be shared only if they are in the same process
Control Flow Conflicts Two operations can be shared only if no execution path that reaches both operations exists from the start of the block to the end of the block.For example, if two operations lie in separate branches of an if or case statement, they are not on the same path (and can be shared).
Data Flow Conflicts When the A+B addition is shared with the TEMP_2+F addition on an adder called R1 and the D+E addition is shared with the TEMP_1+C addition on an adder called R2, a feedback loop results. The variable TEMP_1 connects the output of R1 to the input of R2. The variable TEMP_2 connects the output of R2 to the input of R1, and a feedback loop is created. The circuit is not faulty, because the multiplexing conditions never allow the entire path to be activated simultaneously. Still, the VHDL Compiler resource sharing mechanism does not allow combinational feedback paths to be created, because most timing verifiers cannot handle combinational feedback paths properly.
Implementation Selection Different implementations of the DesignWare components have different area and timing characteristics Design constraints determine the appropriate DesignWare Component
Common Sub Expression Sharing • Default behaviour controlled by variable hlo_share_common_subexpressions • Use set_share_cse to set the share_cse attribute for a design
Two Level vs. Multi Level Source: MIT. Course 6.375. Lecture L05. 2006
Flattening • Goal is to create a 2 level S.O.P. • Hierarchy is preserved. • Two edged sword: • Removes bad structure. • Removes good structure as well. • Speed is the motive. • 2 level S.O.P does not always mean 2 level delay: • Library limitations. • ??
Single Output Multiple Output Flattenning - contd • Flattening guidelines. • Use it gain speed for unstructured / random logic. • follow it up with structuring. • Do not use when the logic is – • Structured. Contains EXORs and MUXes. • > 20 inputs. • Flattening options: • Flattening. • Minimisation. • Phase inversion. • Minimisation. • Single output. • Multiple output. • Flattening is not the same as ungroup
Flattening – contd. Phase Inversion • Controlling flattening. • Off by default. • To turn it on: • dc_shell> set_flatten true • Options. • dc_shell> set_flatten -minmise single_output • dc_shell> set_flatten -minmise multiple_output • dc_shell> set_flatten -phase true. • Effort. • dc_shell> set_flatten -effort medium
Logic Level - Structuring • Adds intermediate variables and thus logic structure • Sharing expressions -> area efficiency. • Negative effect on delay. • Structuring options. • Structuring. • Boolean Optimization. • Timing Driven Structuring.(TDS) • The default Synopsys optimisation strategy. • Takes delay constraints into account while structuring. • Tries to optimise the critical path, by flattening it. • Adds structure to less critical paths. • Important to accurately constraint designs. • Boolean Optimisation. • Uses Boolean Algebra rules like a + a = a, a + a’ = 1 etc. to optimise area at the expense of delay. • Boolean Optimisation and TDS should not be ON at the same time. • Boolean Optimisation creates deep logic, which TDS cannot undo. • CPU intensive. • Off by default.
Structuring Options - Example Structuring Timing Driven Structuring Multi-Level Boolean Optimization
Gate Level – Combinational Mapping Maps the combinational parts of the design to the current technology library to meet design goals
Mapping via DAG Covering Source: MIT. Course 6.375. Lecture L05. 2006
Sample Library Source: MIT. Course 6.375. Lecture L05. 2006
Sample Library 2 Source: MIT. Course 6.375. Lecture L05. 2006
Trivial Covering Source: MIT. Course 6.375. Lecture L05. 2006
Covering #1 Source: MIT. Course 6.375. Lecture L05. 2006
Gate Level – Fixing Violated Design Rule Constraints • The Mapping Process • Two phases. • In first phase tries to meet the optimisation goals. • In the second phase tries to meet the design rule constraints. • If the initial input is a netlist of gates, i.e., mapped, the gate structure is destroyed and a new netlist is built. • Incremental Mapping. • An existing gate structure is treated as a starting point. • New gate structure is accepted only if it lowers the cost. • Normal flattening and structuring is not done. • Local structuring is tried. • dc_shell> compile -incremental mapping • In place optimisation. • Pre-layout fanout based estimates of net length, resistance and capacitance could differ from post-layout numbers. • To change mapping to take these differences into account use in place optimisation. • dc_shell> compile -in_place
Boundary Optimization • A gate level optimisation that uses port connect information such as unconnect, opposite, logic one and logic zero. • Removes any gate which deives output ports that are not connected outside a design. • Able to consider swapping of input ports to minimise logic. • Two ways to invoke: • Apply to entire design hieracrhcy. • Apply to selected designs in the hierarchy. • Use it with care, can change the functionality of sub-design
Multi Cycle Paths Timing path that is not expected to propagate a signal in one cycle This input changes once every 2nd cycle To undo a set_multicycle_path command use reset_path or reset_design.
False Paths You can exclude false paths from an Static Timing Analysis run. False paths are considered unconstrained.
Pipelining – a fully manual approach Increases the throughput of designs to meet high timing constraints.
Re-timing – a semi-automatic approach Increases the throughput of designs to meet high timing constraints.
Re-timing - Limitations Only works on mapped/compiled designs
Re-timing – registered outputs How to get the output registers right?
Synthesis of Finite State Machines Idea - Make Synopsys aware of that the logic represents an FSM
Synthesis Flow of FSMs in Synopsys • Extraction of the state-vector in a design where the state-vector is not the only sequential elements dc> analyze -f vhdl state_vector.vhdl dc> elaborate fsm -arch fsm_behave dc> group -fsm -design_name extracted_fsm dc> current_design = extracted_fsm dc> replace_synthetic dc> extract dc> report -fsm • Extraction of the state-vector in a design where the state-vector attribute is not set in the HDL • Existing Registers U1:FLIP_FLOP port map (NEXT_STATE[0], CLK, STATE[0]); U2:FLIP_FLOP port map (NEXT_STATE[1], CLK, STATE[1]); • How to give the register the state attributes set_fsm_state_vector { U1, U2 } set_fsm_encoding {"S0=2#00", "S1=2#01", "S2=2#10", "S3=2#11" } • Use the script above for the rest
Synthesis flow of FSMs in Synopsys FSM minimize ”S0=2#0001” ”S1=2#0010” ”S2=2#0100” ”S3=2#1000”