300 likes | 451 Views
Design Flows and Tools. Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu). Part II - Agenda. Design Flows Design via decomposition Modeling design using System Verilog Design Automation – The Proteus-A flow Legacy RTL
E N D
Design Flows and Tools Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu)
Part II - Agenda Design Flows • Design via decomposition • Modeling design using System Verilog Design Automation – The Proteus-A flow • Legacy RTL • Added System Verilog CSP front-end • Asynchronous optimizations Final Flow Considerations • Analog Verification • Design for Test and Debug
Design via Process Decomposition • Collection of Processes linked by Channels • Channels pass messages with guaranteed delivery • Processes synchronize • Processes can be decomposed into smaller processes
Modeling Asynchronous Design viaSystemVerilogCSP (SVC) • SystemVerilog interface abstracts channel wires as well as communication protocol • Send/Receive • Blocking tasks (Flow control) Sender Receiver SVC Interface Abstract communication
SVC - Waveform view Receiver performs Receive, Communication happens Receiver pending on Receive No one is Sending or Receiving Sender performs Send, Communication happens Sender pending on Send
Part II - Agenda Design Flows • Design via decomposition • Modeling design using System Verilog Design Automation – The Proteus-A flow • Legacy RTL • Added System Verilog CSP front-end • Asynchronous optimizations Final Flow Considerations • Analog Verification • Design for Test and Debug
Proteus/ Sync Library Sync Library Clock Gating Clock Gating ClockFree Netlist Constraints Clock Tree Synthesis The Proteus-A Flow – Legacy RTL Key Features • Re-uses synchronous EDA tools • Seamless integration into existing flows • Back-end design style agnostic • Up to 2X higher performance Tool Status • Commercialized version in production for 2+ years • Uses proprietary QDI library • Academic version (Proteus-A) enhanced significantly at USC Recent Advances • Power optimization algorithms Design Goals Synth RTL Synthesis Image Netlist Constraints Netlist Constraints AsyncNetlist Netlist Physical Design Final Layout
Synthesized Image Netlist Final Layout Asynchronous Gate-level Netlist Flow Demo – Legacy RTL Physical Design Synthesis Clockfree Synth. RTL Legacy RTL Specification
Amber23 – Proteus-A Case Study • Download from http://opencores.com/project,amber • ARM-compatible 32-bit RISC processor • 3 stages : FETCH, DECODE and EXECUTE Register bank Barrel shifter ALU Multiplexer instruction control Cache Bus interface Decode State machine Cache Bus interface Read data Address, write data Zhang, USC Summer Research, 2012
Amber23 – Performance Comparison • Download from http://opencores.com/project,amber • ARM-compatible 32-bit RISC processor • 3 stages : FETCH, DECODE and EXECUTE Register bank Barrel shifter ALU Multiplexer instruction control Cache Bus interface Decode State machine Cache Bus interface Read data Address, write data Zhang, USC Summer Research, 2012
Proteus/ Sync Library Sync Library Clock Gating Clock Gating ClockFree Netlist Constraints Clock Tree Synthesis The Proteus-A Flow – SVCRTL System- Verilog Key New Features • Supports System Verilog CSP front-end • Enables user-defined conditional communication • Saves power at architectural level Tool Status • Proprietary version starting from CAST developed at Fulcrum • System Verilog version subsequently developed at USC • Used in current research at USC and Technion and 40+ person async class Verilog Design Goals SVC2RTL Constraints Synth. RTL Synthesis Image Netlist Netlist Constraints Constraints AsyncNetlist Netlist Physical Design Final Layout
Key to Low-Power - Conditional Communication Conditional communication reduces token flow, saving power • Traditionally - manually introduced via user-created decomposition • Recent research - automatically introduced via Operand Isolation Add/Sub A,B MUX DEMUX D S R 0 0 op Mult 0 0 + + Saifhashemi, PATMOS 2012
SVC2RTL – Enables User-Defined Conditional Communication 0 Dummy value 0 0 Not sent 1 Not received 1
Part II - Agenda Design Flows • Design via decomposition • Modeling design using System Verilog Design Automation – The Proteus-A flow • Legacy RTL • Added System Verilog CSP front-end • Asynchronous optimizations Final Flow Considerations • Analog Verification • Design for Test and Debug
Power Optimization Overview • Conditioning • Automatically add conditional communication • Reconditioning • Optimize the existing conditionality
Power Saving - The Opportunity + Unnecessary calculation
Our Solution - Adding Isolation Cells • All inputs/outputs are unconditional • Operand Isolation • And-based isolation cells • Generated by synchronous RTL synthesizer • Does not prevent switching in asynchronous circuits Isolation cells are not effective in asynchronous circuits
Our Solution - Conditioning & + + 0 0 No Activity
Power Optimization Results • Case study: 32-bit ALU placed and routed • Back annotated switching activity using a VCD file • Results: • Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2 • 53% power reduction when only isolating MUL (rf=0.25) • Area cost of isolating MUL is about 4% andno performance penalty Saifhashemi, Patmos 2012
Power Savings – The Opportunity Unnecessary activity 0 0 1 0 0 1 0 Unnecessary activity Conditional communication is explicit and only at primary IO
The Reconditioning Problem Definition (The Reconditioning Problem): Rearrange location of RECEIVE and SEND cells to minimize Power consumption while preserving functional behavior.
Power Results RECON2: Conditional multiplier RECON1: Dual-mode arithmetic unit ALU-OI ALU after operand isolation Saifhashemi, PhD Thesis, 2012
Mode Based Conditional Slack Matching Conditional Slack Matching Advantage– Conditional behavior yields less stalls and thus not as many pipeline buffers needed • Previously ignored – conservatively modeled as unconditional Add/Sub A,B MUX DEMUX S S R R op Mult Najibii,2012
Conditional Slack Matching - Results 33% less buffers on average Najibii,2012
Design Flow Demo Design Goals System- Verilog SVC2RTL Constraints Synth. RTL Synthesis Image Netlist Constraints Proteus/ Sync Library ClockFree AsyncNetlist Constraints Physical Design Final Layout
Agenda Design Flows • Design via decomposition • Modeling design using System Verilog Design Automation – The Proteus-A flow • Legacy RTL • Added System Verilog CSP front-end • Asynchronous optimizations Final Flow Considerations • Analog Verification • Design for Test and Debug
Final Flow Considerations Static Timing Analysis • Verify timing constraints and performance is a must • Trick traditional tools into working with asynchronous circuits Analog Verification • Domino logic used in QDI flows sensitive to charge sharing • Asynchronous channels cannot tolerate cross-talk glitches • Special spiced-based tools developed Asynchronous Scan • Asynchronous scan is a must but doable Design for Silicon Debug • Chip deadlock is still difficult to debug
Conclusions The Asynchronous Design Flow/CAD Landscape • Synchronous design rigidity continues to hamper quality design • Asynchronous design offers solutions but has many design flow challenges Design Flow Requirements • Design flows must easily integrate into synchronous designs • Circuit quality must compete very well to warrant switching design styles Our approach • Proteus provides a good design framework for automation of both legacy RTL and SystemVerilog CSP • Final considerations of analog and timing verification, scan, and debug should not be over looked