180 likes | 299 Views
Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous Pipelines. Arash Saifhashemi Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu) (Thanks to a grant from Intel and NSF)
E N D
Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous Pipelines Arash Saifhashemi Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu) (Thanks to a grant from Intel and NSF) Patmos 2012, Sep 2012, Newcastle upon Tyne
Asynchronous Circuit Design - Today Applications • 3D Network on chips (STMicroelectronics) • Ethernet Switches (Intel SRD) • Ultra high-speed FPGAs (Achronix) • Process variation • Low-power chip design (Encryption – Tiempo, …) Basic challenges: Automation Proteus design flow (USC) • Uses commercial synchronous CAD tools • Starting at a high-level specification written in SVC (SystemVerilogCSP) STMicroelectronics WIOMING 3D-IC (July 2012) AchronixFPGA. 1.7 M LUTs. 2.1 Gbps IO TiempoTAM16 - Clockless16-bit microcontroller Fulcrum Microsystems Ethernet switch chip (up to 72 10G ports, 40G) - 1.2 B transistors, 90% Asynchronous 13% Proteus
Proteus/ Sync Library Sync Library Clock Gating Clock Gating ClockFree Netlist Constraints Clock Tree Synthesis The Proteus Flow System- Verilog Key Features • Re-uses synchronous EDA tools • Seamless integration into existing flows • Up to 2X higher performance Tool Status • Started at USC Async CAD/VLSI • Commercialized by TimeLess (2008) • Acquired by Fulcrum (2010) • Intel Acquired Fulcrum (2011) • Used in Intel Ethernet Alta FM6000 chip The Problem • Limited and manual power optimization Verilog Design Goals SVC2RTL Constraints Synth. RTL Synthesis Image Netlist Netlist Constraints Constraints AsyncNetlist Netlist Physical Design Final Layout
Conditional Communication in Proteus 0 Dummy value 0 0 Not sent 1 Not received 1
Example: ALU SVC Description No conditionality in high-level description
Reconvergingfanouts + Unnecessary calculation
Adding Isolation Cells • All inputs/outputs are unconditional • Operand Isolation • And-based isolation cells • Generated by synchronous RTL synthesizer • Does not prevent swit ching in asynchronous circuits Isolation cells are not effective in asynchronous circuits
Three-valued logic • Formal justification of conditioning • Three-valued logic image model • Each iteration is modeled by a clock cycle • Each variable can be 0, 1, or N (no token) One iteration Status of each channel
3VL Unconditional Functions Unconditional functions • Can be represented only by , , operators • Example: functions represented by combinational gates in a typical cell library: NAND, NOR, AOI, XOR, … • Lemma 1: the output is Niff at least one of the inputs is N.
SEND/RECEIVE Operators • Conditional Communication • RECEIVE and SEND are modeled as Ⓡ and Ⓢ operators Behave like buffers when E=1
SEND Reconditioning Assuming y=f(x) is unconditional and e TFO(y) • Lemma 2: Application: SEND cells can be moved through logic • Similar to retiming in synchronous circuits Less number of SENDs Less switching when e=0
Observability in 3V Networks Local Observability Partial Care (LOPC) • OPC(f,C,xj) of input xjof a node representing a function fis the condition under which f’s output is not affected as xjchanges in C {0,1,N} Global Observability Partial Care (GOPC) • GOPC(C,x) of a variable x is the condition under which the value of no primary output is affected as the value of x changes in C {0,1,N} • Example: s =1 i1 changes in {0,1} are not observable when… i2 =0 or i2 =1
GOPC Conditioning When xj is not observable… • Add a SEND followed by a RECEIVE • Move the SENDs using SEND reconditioning SEND Reconditioning • Lemma 3: N N 0 N 1 N 0 or 1 N
Conditioning & + + 0 0 No Activity
Inserting Isolating Nodes and Recognizing Enable Domains Synchronous synthesis tools can insert isolating nodes • Constrained to insert isolating nodes only on non-critical paths Node u is in e’s Enable Domain OIED(e) if • All paths starting from a primary input and ending at u include an isolating node controlled by e • Detected using a DFS search
Pre-layout Analysis • Wu : power of receiving data on all inputs and sending the output (unconditional nodes) • K: power of conditional nodes • rf: activity factor Power of each domain Total power Domain power after isolation (n inputs) Benefit of isolating each domain
Post-layout Experimental Results • Case study: 32-bit ALU placed and routed • Back annotated switching activity using a VCD file • Results: • Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2 • 53% power reduction when only isolating MUL (rf=0.25) • Area cost of isolating MUL is about 4% andno performance penalty
Conclusions and Future Work Conditional communication in async. circuits is not free • Creates area and performance overheads • Requires manual or automatic optimization Asynchronous circuits can/should leverage sync. tools • This paper is first to use 3-valued-logic and observability don’t cares for power optimization of asynchronous circuits Our future work • Evaluate the proposed method on bigger designs • Adopt other sync power optimization techniques such as clock gating • Optimize the location of SEND/RECEIVE nodes (Reconditioning)