590 likes | 1.19k Views
VHDL Design Tips and Low Power Design Techniques. Jonathan Alexander Applications Consulting Manager Actel Corporation MAPLD 2004. Agenda. Advanced VHDL ProASIC Plus Synthesis, Options and Attributes Timing Specifications Design Hints Power-Conscious Design Techniques Summary .
E N D
VHDL Design Tips and Low PowerDesign Techniques Jonathan Alexander Applications Consulting Manager Actel Corporation MAPLD 2004
Agenda • Advanced VHDL • ProASICPlus Synthesis, Options and Attributes • Timing Specifications • Design Hints • Power-Conscious Design Techniques • Summary
Actel ProASICPlus Design Flow VHDL Source Directives Logic Optimization Attributes Synthesis Timing Technology Mapping Place & Route Timing, Pin, Placement Technology Implementation
What is Synthesis? • The mapping of a behavioral description to a specific target technology, • i.e. Generates a structural netlist from a HDL description • Includes optimization steps • Optimize the design implementation for • Higher Speed • Smaller Area • Lower Power
ProASICPlus HDL Attributes and Directives • Attributes are used to direct the way your design is optimized and mapped during synthesis. • Directives control the way your design is analyzed prior to synthesis. Because of this, directives must be included in your VHDL source code. • Three important ProASICPlus attributes or directives are available: • “syn_maxfan” (attribute) • “syn_keep” (directive) • “syn_encoding” (attribute)
ProASICPlus HDL Attributes and Directives (cont’d) • syn_maxfan = “Value” • “Value” Range > 4 • Can be assigned to an input port, register output, or a net • Overrides the global “Fanout Limit” setting • The tool will replicate the signal if this attribute is associated with it • Syntax • In the HDL code • attribute syn_maxfan of data_in : signal is 1000; • In the constraint file • define_attribute {clk} syn_maxfan {200}
ProASICPlus HDL Attributes and Directives (cont’d) • syn_keep = 1 • When associated with a signal, this directive prevents Synplify from combining or collapsing the node. • This attribute can be associated with combinatorial signals only • Syntax • In the HDL code • Attribute syn_keep of st: signal is Integer :=1 ; • In the constraint file • define_attribute {st} syn_keep {1};
Agenda • Advanced VHDL • ProASICPlusSynthesis and Options and Attributes • Timing Specifications • Design Hints • Power-Conscious Design Techniques • Summary
Timing Constraints Specification • Synplify ProASICPlus mapper allows specification of the following: • Global Design Frequency • Multi-clock design • Skew between two clocks • Input and output delays • Functional multi-cycle and false paths • All these timing specifications are available in the GUI, the presentation will cover the sdc constructs only.
Design Frequency Specification • Multiple Clocks • Graphical User Interface “Frequency” item allows specification of a global value for all clocks • This setting influences the operator architecture selection (speed or area) during mapping • This value should be set to the highest frequency required in the design • To specify individual values for different clocks, use the following sdc construct • define_clock {clock_1} -freq <Value1> • define_clock {clock_2} -freq <Value2>
Skew Specification in Synplify • To define a skew between two clocks, use the following constraint: • define_clock_delay -rise {clock1} -rise {clock2} “value” • Example • define_clock_delay -rise {CLK19M} -rise {MPU_CLK} 1.0 • define_clock_delay -rise {MPU_CLK} -rise {CLK19M} 2.0
Input Delay • Specifies the input arrival time of a signal in relation to the clock. • It is used at the input ports, to model the interface of the inputs of the FPGA with the outside environment. • The value entered should represent the delay outside of the chip before the signal arrives at the input pin • To specify the “input delay” on an input port, use the following constraint: • define_input_delay {InputPortName} “Value”
Output Delay • Specifies the delay of the logic outside the FPGA driven by the top-level outputs. • Used to model the interface of the outputs of the FPGA with the outside environment. • To specify the “output delay”, use the following constraints: • define_output_delay {OutputPortName} “Value”
Functional False Path • “define_false_path” allows user to specify paths which will be ignored for timing analysis, but will still be optimized, without priority within Synplify. • The following options are available : • -from < a register or input pin> • -to <a register or output pin> • -through <through a net signal> • Example • define_false_path -from Register_A • define_false_path -to Register_B • #Paths to Register_B are ignored • define_false_path -through test_net • #Paths through Int_Net are ignored
Agenda • Advanced VHDL • ProASICPlus Synthesis, Options and Attributes • Timing Specifications • Design Hints • Power-Conscious Design Techniques • Summary
Late Arrival Signals: Prioritization -- Initial Description case State is when WAIT => if Critical then Target <= Source_1; else Target <= Source_2; end if; when ACTIVE => if Critical then Target <= Source_1; else Target <= Source_3; end if; when …. end case; -- Modified Description ifCriticalthen Target <=Source_1; else case State is when WAIT => Target <= Source_2; when ACTIVE => Target <= Source_3; when …. end case; end if; State State Target Source_2 Target Source_1 Source_1 Critical Critical
A_late Max + B B Late Arrival Signal: Another Hint ! Max ……. begin if ((A_late + B) >= Max) then Out = C; else Out = D; end if; … … end Process; >= C Out mux D A_late >= if ((B - Max) >= A_late) Out = C; else Out = D;. C Out mux D
Signal vs Variable • Variable assignments are sensitive to order. • Variables are updated immediately • Signal assignments are order independent. • Signal assignments are scheduled Process (Clk) begin if (Clk’Event and Clk=‘1’) then Trgt1 <= In1 xor In2; Trgt2 <= Trgt1; Trgt3 <= Trgt2; end if; end process; Signal vTarg3 : std_logic; Process (Clk) Variable vTarg1, vTarg2: ... begin if (Clk’Event and Clk=‘1’) then vTrgt1 := In1 xor In2; vTrgt2 := vTrgt1; vTrgt3 <= vTrgt2; end if; end process; Process (Clk) Variable vTarg1, vTarg2 : ... begin if (Clk’Event and Clk=‘1’) then Trgt3 <= vTrgt2; vTrgt2 := vTrgt1; vTrgt1 := In1 xor In2; end if; end process; Process (Clk) begin if (Clk’Event and Clk=‘1’) then Trgt2 <= Trgt1; Trgt3 <= Trgt2; Trgt1 <= In1 xor In2; end if; end process; Trgt3 Trgt3 Trgt3
Sel X Y mux * Y Res X Res mux Y * Z mux Z Sel Sel X Y mux Res Y * * Sel Z Resource Sharing and “Operand” Alignment With Resource Sharing (Smaller) Operand Alignment (Faster*) HDL Code process (X, Y, Z, Sel) begin if (Sel = ‘0’) then Res <= X * Y ; else Res <= Y * Z ; end if; end process; (*) Especially if Y is a Late Arrival Signal Without Resource Sharing (Larger and Slower) Implementations
X Y Z mux = = T Sel Resource Sharing to Avoid • Buses Sel With Resource Sharing (Larger and Slower) X 16 VHDL Code mux 1 Y = 16 Eq Z process (X, Y, Z, T, Sel) begin if (Sel = ‘0’) then Eq <= (X = Y); else Eq <= (Z =T); end if; end process; mux T Sel 1 Without Resource Sharing (Smaller and Faster) Eq 1 Implementation
tri_en1 mux_in4 tri_in1 mux_in3 tri_en2 mux_en3 tri_in2 mux_in2 tri_en3 mux_en2 tri_in3 mux_in1 tri_en4 mux_en1 tri_in4 Internal Three-state Buffers • At the VHDL Level • Either Using the Multiplexer based modified VHDL code, or • Replace the three-state structure using the equivalent following AND-OR structure tri_out tri_en1 tri_in1 tri_en2 tri_in2 tri_en3 tri_in3 tri_en4 tri_in4 tri_out mux_out
Agenda • Advanced VHDL • Power-Conscious Design Techniques • Data Path Selection • FSM Encoding • Gating Clocks and Signals • Advanced Power Design Practices • Summary
Sources of Dynamic Power Consumption • Switching • CMOS circuits dissipate power during switching • The more logic levels used, the more switching activity needed • Frequency • Dynamic power increases linearly with frequency • Loading • Dynamic power increases with capacitive loading • Glitch Propagation • Glitches cause excessive switching to occur at relatively high frequencies. • Clock Trees • Clock Trees operate at high frequency under heavy loading, so they contribute significantly to the total power consumption.
Data Path Elements Selection • Basic block selection is critical as the power/speed tradeoff has to be well identified • Power is switching activity dependent, thus input data pattern dependent • Watch the architecture of the basic arithmetic and logic blocks • Check area/speed and fanout distribution/number of logic levels • High fanout + large number of logic level = higher glitch propagation • Investigate pipelining effect on power dissipation • Impact on clock tree power consumption • Impact on block fanout distribution
Data Path Architectures • Adders Architectures • Architecture Evaluation • Test Results • Multipliers • Architectures and Power Implications • Pipelined Configurations • Pipeline Effect on Power • Pipelining vs re-Timing
Review: Ripple Adder Carry signal switching propagates through all the stages and consumes Power
Review Carry Look-Ahead Adder • Carry signal switching propagates through less stages • However, higher number of Logic Level
Carry Select Adder Overview • Carry signal switching propagates through less stages • However, higher duplication and complexity • Principle: • Do it twice (considering Carry=0 and Carry=1) then when actual Carry is ready, Select appropriate result
Adder Architectures Forward Carry Look Ahead (CLF): Fastest but also largest Brent and Kung (BK): Almost same speed as CLF but drastically smaller Carry Look Ahead (CLA): Relatively small and slow Ripple (RPL): Smallest but slowest Brent and Kung: Best area/speed tradeoff
Adders Power Dissipation • Brent and Kung: Lowest Power Dissipation • Lowest logic levels • Lowest fanout
Data Path Architectures • AddersArchitectures • Architecture Evaluation • Test Results • Multipliers • Architectures and Power Implications • Pipelined Configurations • Pipeline Effect on Power • Pipelining vs re-Timing
Multiplier’s Power Consumption • Wallace Advantages Over Carry-Save Multiplier (CSM) • Uniform switching propagation • Less logic levels • Lower average fanout
Data Path Architectures • Adders Architectures • Architecture Evaluation • Test Results • Multiplier • Architectures and Power Implications • Pipelined Configurations • Pipeline Effect on Power • Pipelining vs re-Timing
1/2 Processing Unit 1/2 Processing Unit Processing Unit Register Register Register Pipelining for Glitch Reduction • A logically deep internal net is typically affected by more primary inputs switching, and is therefore more susceptible to glitches • Pipelining shortens the depth of combinatorial logic by inserting pipeline registers • Pipelining is very effective for data path elements such as parity trees and multipliers
Pipelined FFT Pipelined Clock Tree Non- Pipelined Clock Tree Non- Pipelined FFT Pipelining Effect on Power Pipelining increases clock tree power, but overall power is lowered
Pipelining vs. Re-timing • Pipelining introduces new registers • Re-timing does not introduce new registers • Example: FIR re-timing • Re-timing also reduces power • Registers prevent glitch propagation through high logic-level paths (ie mulitpliers)
Agenda • Advanced VHDL • Power Conscious Design Techniques • Data Path Selection • FSM Encoding • Gating Clocks and Signals • Advanced Power Design Practices • Summary
Counter’s Power Measurement on ProASIC Power dissipation for 200 instances of 8 bit-counters As expected Gray counters dissipate less power (~25%)
Agenda • Advanced VHDL • Power Conscious Design Techniques • Data Path Selection • FSM Encoding and Effect on Power • Gating Clocks & Signals • Advanced Power Design Practices • Summary
Signal Gating • There are several logic implementations of signal gating Latch or FF & Tri-state buffer
Gating Clocks • Most Used mechanism to gate clocks Data_Out (N Bits) New_Data New_Data (N Bits) LD_Enable FSM FSM L A T C H LD_Enable CLK_En CLK CLK Gating clock signals with combinatorial logic is not recommended. Glitches are easily created by the clock gate which may result in incorrect triggering of the register
IN0 OUT0 IN1 OUT1 OUT2 OUT3 Gating Signals: Address Decoder Example OUT0 IN0 IN1 OUT1 OUT2 Enable/Select OUT3 A switching activity on one of the input of the decoder will induce an large number of toggling outputs Enable/Select signal prevents the propagation of their switching activity
Agenda • Advanced VHDL • Power Conscious Design Techniques • Data Path Selection • FSM Encoding and Effect on Power • Gating Clocks and Signals • Advanced Practices • Summary
Stable Expression Mux Glitchy Expression Mux Stable Expression Mux Glitchy Expression Mux VHDL Coding Effect on Power • Example: IF … THEN …. ELSE ….; • Re-organizing the code helps to prevent propagation of switching activity
X + T + Y + + Z + T + Y X Z Delay Balancing • If all primary inputs have the same arrival time and the same switching probability, balancing trees eliminates switching propagation Un-Balanced Balanced
Condition L a t c h Condition Multiplier Condition Multiplier M u x M u x Guarded Evaluation • Technique used to reduce switching activity by adding latches or floating gates at the inputs of combinatorial blocks if their outputs are not used. • Example: Results of multiplier may or may not be used depending on the condition, Adding transparent Latches or AND gates on the inputs avoids power dissipation as they mask useless input activity.
Pre-computation Based Power Reduction Common Clock Combinatorial Logic R1 Pre-Computation Input Outputs R2 Gated Input Pre-Computation Logic