1 / 54

VHDL Design Tips and Low Power Design Techniques

VHDL Design Tips and Low Power Design Techniques. Jonathan Alexander Applications Consulting Manager Actel Corporation MAPLD 2004. Agenda. Advanced VHDL ProASIC Plus Synthesis, Options and Attributes Timing Specifications Design Hints Power-Conscious Design Techniques Summary .

dahlia
Download Presentation

VHDL Design Tips and Low Power Design Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VHDL Design Tips and Low PowerDesign Techniques Jonathan Alexander Applications Consulting Manager Actel Corporation MAPLD 2004

  2. Agenda • Advanced VHDL • ProASICPlus Synthesis, Options and Attributes • Timing Specifications • Design Hints • Power-Conscious Design Techniques • Summary

  3. Actel ProASICPlus Design Flow VHDL Source Directives Logic Optimization Attributes Synthesis Timing Technology Mapping Place & Route Timing, Pin, Placement Technology Implementation

  4. What is Synthesis? • The mapping of a behavioral description to a specific target technology, • i.e. Generates a structural netlist from a HDL description • Includes optimization steps • Optimize the design implementation for • Higher Speed • Smaller Area • Lower Power

  5. ProASICPlus HDL Attributes and Directives • Attributes are used to direct the way your design is optimized and mapped during synthesis. • Directives control the way your design is analyzed prior to synthesis. Because of this, directives must be included in your VHDL source code. • Three important ProASICPlus attributes or directives are available: • “syn_maxfan” (attribute) • “syn_keep” (directive) • “syn_encoding” (attribute)

  6. ProASICPlus HDL Attributes and Directives (cont’d) • syn_maxfan = “Value” • “Value” Range > 4 • Can be assigned to an input port, register output, or a net • Overrides the global “Fanout Limit” setting • The tool will replicate the signal if this attribute is associated with it • Syntax • In the HDL code • attribute syn_maxfan of data_in : signal is 1000; • In the constraint file • define_attribute {clk} syn_maxfan {200}

  7. ProASICPlus HDL Attributes and Directives (cont’d) • syn_keep = 1 • When associated with a signal, this directive prevents Synplify from combining or collapsing the node. • This attribute can be associated with combinatorial signals only • Syntax • In the HDL code • Attribute syn_keep of st: signal is Integer :=1 ; • In the constraint file • define_attribute {st} syn_keep {1};

  8. Agenda • Advanced VHDL • ProASICPlusSynthesis and Options and Attributes • Timing Specifications • Design Hints • Power-Conscious Design Techniques • Summary

  9. Timing Constraints Specification • Synplify ProASICPlus mapper allows specification of the following: • Global Design Frequency • Multi-clock design • Skew between two clocks • Input and output delays • Functional multi-cycle and false paths • All these timing specifications are available in the GUI, the presentation will cover the sdc constructs only.

  10. Design Frequency Specification • Multiple Clocks • Graphical User Interface “Frequency” item allows specification of a global value for all clocks • This setting influences the operator architecture selection (speed or area) during mapping • This value should be set to the highest frequency required in the design • To specify individual values for different clocks, use the following sdc construct • define_clock {clock_1} -freq <Value1> • define_clock {clock_2} -freq <Value2>

  11. Skew Specification in Synplify • To define a skew between two clocks, use the following constraint: • define_clock_delay -rise {clock1} -rise {clock2} “value” • Example • define_clock_delay -rise {CLK19M} -rise {MPU_CLK} 1.0 • define_clock_delay -rise {MPU_CLK} -rise {CLK19M} 2.0

  12. Input Delay • Specifies the input arrival time of a signal in relation to the clock. • It is used at the input ports, to model the interface of the inputs of the FPGA with the outside environment. • The value entered should represent the delay outside of the chip before the signal arrives at the input pin • To specify the “input delay” on an input port, use the following constraint: • define_input_delay {InputPortName} “Value”

  13. Output Delay • Specifies the delay of the logic outside the FPGA driven by the top-level outputs. • Used to model the interface of the outputs of the FPGA with the outside environment. • To specify the “output delay”, use the following constraints: • define_output_delay {OutputPortName} “Value”

  14. Functional False Path • “define_false_path” allows user to specify paths which will be ignored for timing analysis, but will still be optimized, without priority within Synplify. • The following options are available : • -from < a register or input pin> • -to <a register or output pin> • -through <through a net signal> • Example • define_false_path -from Register_A • define_false_path -to Register_B • #Paths to Register_B are ignored • define_false_path -through test_net • #Paths through Int_Net are ignored

  15. Agenda • Advanced VHDL • ProASICPlus Synthesis, Options and Attributes • Timing Specifications • Design Hints • Power-Conscious Design Techniques • Summary

  16. Late Arrival Signals: Prioritization -- Initial Description case State is when WAIT => if Critical then Target <= Source_1; else Target <= Source_2; end if; when ACTIVE => if Critical then Target <= Source_1; else Target <= Source_3; end if; when …. end case; -- Modified Description ifCriticalthen Target <=Source_1; else case State is when WAIT => Target <= Source_2; when ACTIVE => Target <= Source_3; when …. end case; end if; State State Target Source_2 Target Source_1 Source_1 Critical Critical

  17. A_late Max + B B Late Arrival Signal: Another Hint ! Max ……. begin if ((A_late + B) >= Max) then Out = C; else Out = D; end if; … … end Process; >= C Out mux D A_late >= if ((B - Max) >= A_late) Out = C; else Out = D;. C Out mux D

  18. Signal vs Variable • Variable assignments are sensitive to order. • Variables are updated immediately • Signal assignments are order independent. • Signal assignments are scheduled Process (Clk) begin if (Clk’Event and Clk=‘1’) then Trgt1 <= In1 xor In2; Trgt2 <= Trgt1; Trgt3 <= Trgt2; end if; end process; Signal vTarg3 : std_logic; Process (Clk) Variable vTarg1, vTarg2: ... begin if (Clk’Event and Clk=‘1’) then vTrgt1 := In1 xor In2; vTrgt2 := vTrgt1; vTrgt3 <= vTrgt2; end if; end process; Process (Clk) Variable vTarg1, vTarg2 : ... begin if (Clk’Event and Clk=‘1’) then Trgt3 <= vTrgt2; vTrgt2 := vTrgt1; vTrgt1 := In1 xor In2; end if; end process; Process (Clk) begin if (Clk’Event and Clk=‘1’) then Trgt2 <= Trgt1; Trgt3 <= Trgt2; Trgt1 <= In1 xor In2; end if; end process; Trgt3 Trgt3 Trgt3

  19. Sel X Y mux * Y Res X Res mux Y * Z mux Z Sel Sel X Y mux Res Y * * Sel Z Resource Sharing and “Operand” Alignment With Resource Sharing (Smaller) Operand Alignment (Faster*) HDL Code process (X, Y, Z, Sel) begin if (Sel = ‘0’) then Res <= X * Y ; else Res <= Y * Z ; end if; end process; (*) Especially if Y is a Late Arrival Signal Without Resource Sharing (Larger and Slower) Implementations

  20. X Y Z mux = = T Sel Resource Sharing to Avoid • Buses Sel With Resource Sharing (Larger and Slower) X 16 VHDL Code mux 1 Y = 16 Eq Z process (X, Y, Z, T, Sel) begin if (Sel = ‘0’) then Eq <= (X = Y); else Eq <= (Z =T); end if; end process; mux T Sel 1 Without Resource Sharing (Smaller and Faster) Eq 1 Implementation

  21. tri_en1 mux_in4 tri_in1 mux_in3 tri_en2 mux_en3 tri_in2 mux_in2 tri_en3 mux_en2 tri_in3 mux_in1 tri_en4 mux_en1 tri_in4 Internal Three-state Buffers • At the VHDL Level • Either Using the Multiplexer based modified VHDL code, or • Replace the three-state structure using the equivalent following AND-OR structure tri_out tri_en1 tri_in1 tri_en2 tri_in2 tri_en3 tri_in3 tri_en4 tri_in4 tri_out mux_out

  22. Agenda • Advanced VHDL • Power-Conscious Design Techniques • Data Path Selection • FSM Encoding • Gating Clocks and Signals • Advanced Power Design Practices • Summary

  23. Sources of Dynamic Power Consumption • Switching • CMOS circuits dissipate power during switching • The more logic levels used, the more switching activity needed • Frequency • Dynamic power increases linearly with frequency • Loading • Dynamic power increases with capacitive loading • Glitch Propagation • Glitches cause excessive switching to occur at relatively high frequencies. • Clock Trees • Clock Trees operate at high frequency under heavy loading, so they contribute significantly to the total power consumption.

  24. Data Path Elements Selection • Basic block selection is critical as the power/speed tradeoff has to be well identified • Power is switching activity dependent, thus input data pattern dependent • Watch the architecture of the basic arithmetic and logic blocks • Check area/speed and fanout distribution/number of logic levels • High fanout + large number of logic level = higher glitch propagation • Investigate pipelining effect on power dissipation • Impact on clock tree power consumption • Impact on block fanout distribution

  25. Data Path Architectures • Adders Architectures • Architecture Evaluation • Test Results • Multipliers • Architectures and Power Implications • Pipelined Configurations • Pipeline Effect on Power • Pipelining vs re-Timing

  26. Review: Ripple Adder Carry signal switching propagates through all the stages and consumes Power

  27. Review Carry Look-Ahead Adder • Carry signal switching propagates through less stages • However, higher number of Logic Level

  28. Carry Select Adder Overview • Carry signal switching propagates through less stages • However, higher duplication and complexity • Principle: • Do it twice (considering Carry=0 and Carry=1) then when actual Carry is ready, Select appropriate result

  29. Adder Architectures Forward Carry Look Ahead (CLF): Fastest but also largest Brent and Kung (BK): Almost same speed as CLF but drastically smaller Carry Look Ahead (CLA): Relatively small and slow Ripple (RPL): Smallest but slowest Brent and Kung: Best area/speed tradeoff

  30. Adders Power Dissipation • Brent and Kung: Lowest Power Dissipation • Lowest logic levels • Lowest fanout

  31. Data Path Architectures • AddersArchitectures • Architecture Evaluation • Test Results • Multipliers • Architectures and Power Implications • Pipelined Configurations • Pipeline Effect on Power • Pipelining vs re-Timing

  32. Multiplier’s Power Consumption • Wallace Advantages Over Carry-Save Multiplier (CSM) • Uniform switching propagation • Less logic levels • Lower average fanout

  33. Data Path Architectures • Adders Architectures • Architecture Evaluation • Test Results • Multiplier • Architectures and Power Implications • Pipelined Configurations • Pipeline Effect on Power • Pipelining vs re-Timing

  34. 1/2 Processing Unit 1/2 Processing Unit Processing Unit Register Register Register Pipelining for Glitch Reduction • A logically deep internal net is typically affected by more primary inputs switching, and is therefore more susceptible to glitches • Pipelining shortens the depth of combinatorial logic by inserting pipeline registers • Pipelining is very effective for data path elements such as parity trees and multipliers

  35. Pipelined FFT Pipelined Clock Tree Non- Pipelined Clock Tree Non- Pipelined FFT Pipelining Effect on Power Pipelining increases clock tree power, but overall power is lowered

  36. Pipelining vs. Re-timing • Pipelining introduces new registers • Re-timing does not introduce new registers • Example: FIR re-timing • Re-timing also reduces power • Registers prevent glitch propagation through high logic-level paths (ie mulitpliers)

  37. Agenda • Advanced VHDL • Power Conscious Design Techniques • Data Path Selection • FSM Encoding • Gating Clocks and Signals • Advanced Power Design Practices • Summary

  38. FSM and Counter Encoding: Impact on Power

  39. Counters and FSMs:State Register Transitions

  40. Counter’s Power Measurement on ProASIC Power dissipation for 200 instances of 8 bit-counters As expected Gray counters dissipate less power (~25%)

  41. FSM Encoding: Effects on Power

  42. Agenda • Advanced VHDL • Power Conscious Design Techniques • Data Path Selection • FSM Encoding and Effect on Power • Gating Clocks & Signals • Advanced Power Design Practices • Summary

  43. Signal Gating • There are several logic implementations of signal gating Latch or FF & Tri-state buffer

  44. Gating Clocks • Most Used mechanism to gate clocks Data_Out (N Bits) New_Data New_Data (N Bits) LD_Enable FSM FSM L A T C H LD_Enable CLK_En CLK CLK Gating clock signals with combinatorial logic is not recommended. Glitches are easily created by the clock gate which may result in incorrect triggering of the register

  45. IN0 OUT0 IN1 OUT1 OUT2 OUT3 Gating Signals: Address Decoder Example OUT0 IN0 IN1 OUT1 OUT2 Enable/Select OUT3 A switching activity on one of the input of the decoder will induce an large number of toggling outputs Enable/Select signal prevents the propagation of their switching activity

  46. Agenda • Advanced VHDL • Power Conscious Design Techniques • Data Path Selection • FSM Encoding and Effect on Power • Gating Clocks and Signals • Advanced Practices • Summary

  47. Stable Expression Mux Glitchy Expression Mux Stable Expression Mux Glitchy Expression Mux VHDL Coding Effect on Power • Example: IF … THEN …. ELSE ….; • Re-organizing the code helps to prevent propagation of switching activity

  48. X + T + Y + + Z + T + Y X Z Delay Balancing • If all primary inputs have the same arrival time and the same switching probability, balancing trees eliminates switching propagation Un-Balanced Balanced

  49. Condition L a t c h Condition Multiplier Condition Multiplier M u x M u x Guarded Evaluation • Technique used to reduce switching activity by adding latches or floating gates at the inputs of combinatorial blocks if their outputs are not used. • Example: Results of multiplier may or may not be used depending on the condition, Adding transparent Latches or AND gates on the inputs avoids power dissipation as they mask useless input activity.

  50. Pre-computation Based Power Reduction Common Clock Combinatorial Logic R1 Pre-Computation Input Outputs R2 Gated Input Pre-Computation Logic

More Related