1 / 18

Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous Pipelines

Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous Pipelines. Arash Saifhashemi Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu) (Thanks to a grant from Intel and NSF)

yaron
Download Presentation

Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous Pipelines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous Pipelines Arash Saifhashemi Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu) (Thanks to a grant from Intel and NSF) Patmos 2012, Sep 2012, Newcastle upon Tyne

  2. Asynchronous Circuit Design - Today Applications • 3D Network on chips (STMicroelectronics) • Ethernet Switches (Intel SRD) • Ultra high-speed FPGAs (Achronix) • Process variation • Low-power chip design (Encryption – Tiempo, …) Basic challenges: Automation Proteus design flow (USC) • Uses commercial synchronous CAD tools • Starting at a high-level specification written in SVC (SystemVerilogCSP) STMicroelectronics WIOMING 3D-IC (July 2012) AchronixFPGA. 1.7 M LUTs. 2.1 Gbps IO TiempoTAM16 - Clockless16-bit microcontroller Fulcrum Microsystems Ethernet switch chip (up to 72 10G ports, 40G) - 1.2 B transistors, 90% Asynchronous 13% Proteus

  3. Proteus/ Sync Library Sync Library Clock Gating Clock Gating ClockFree Netlist Constraints Clock Tree Synthesis The Proteus Flow System- Verilog Key Features • Re-uses synchronous EDA tools • Seamless integration into existing flows • Up to 2X higher performance Tool Status • Started at USC Async CAD/VLSI • Commercialized by TimeLess (2008) • Acquired by Fulcrum (2010) • Intel Acquired Fulcrum (2011) • Used in Intel Ethernet Alta FM6000 chip The Problem • Limited and manual power optimization Verilog Design Goals SVC2RTL Constraints Synth. RTL Synthesis Image Netlist Netlist Constraints Constraints AsyncNetlist Netlist Physical Design Final Layout

  4. Conditional Communication in Proteus 0 Dummy value 0 0 Not sent 1 Not received 1

  5. Example: ALU SVC Description No conditionality in high-level description

  6. Reconvergingfanouts + Unnecessary calculation

  7. Adding Isolation Cells • All inputs/outputs are unconditional • Operand Isolation • And-based isolation cells • Generated by synchronous RTL synthesizer • Does not prevent swit ching in asynchronous circuits Isolation cells are not effective in asynchronous circuits

  8. Three-valued logic • Formal justification of conditioning • Three-valued logic image model • Each iteration is modeled by a clock cycle • Each variable can be 0, 1, or N (no token) One iteration Status of each channel

  9. 3VL Unconditional Functions Unconditional functions • Can be represented only by , , operators • Example: functions represented by combinational gates in a typical cell library: NAND, NOR, AOI, XOR, … • Lemma 1: the output is Niff at least one of the inputs is N.

  10. SEND/RECEIVE Operators • Conditional Communication • RECEIVE and SEND are modeled as Ⓡ and Ⓢ operators Behave like buffers when E=1

  11. SEND Reconditioning Assuming y=f(x) is unconditional and e  TFO(y) • Lemma 2: Application: SEND cells can be moved through logic • Similar to retiming in synchronous circuits Less number of SENDs Less switching when e=0

  12. Observability in 3V Networks Local Observability Partial Care (LOPC) • OPC(f,C,xj) of input xjof a node representing a function fis the condition under which f’s output is not affected as xjchanges in C  {0,1,N} Global Observability Partial Care (GOPC) • GOPC(C,x) of a variable x is the condition under which the value of no primary output is affected as the value of x changes in C  {0,1,N} • Example: s =1 i1 changes in {0,1} are not observable when… i2 =0 or i2 =1

  13. GOPC Conditioning When xj is not observable… • Add a SEND followed by a RECEIVE • Move the SENDs using SEND reconditioning SEND Reconditioning • Lemma 3: N N 0 N 1 N 0 or 1 N

  14. Conditioning & + + 0 0 No Activity

  15. Inserting Isolating Nodes and Recognizing Enable Domains Synchronous synthesis tools can insert isolating nodes • Constrained to insert isolating nodes only on non-critical paths Node u is in e’s Enable Domain OIED(e) if • All paths starting from a primary input and ending at u include an isolating node controlled by e • Detected using a DFS search

  16. Pre-layout Analysis • Wu : power of receiving data on all inputs and sending the output (unconditional nodes) • K: power of conditional nodes • rf: activity factor Power of each domain Total power Domain power after isolation (n inputs) Benefit of isolating each domain

  17. Post-layout Experimental Results • Case study: 32-bit ALU placed and routed • Back annotated switching activity using a VCD file • Results: • Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2 • 53% power reduction when only isolating MUL (rf=0.25) • Area cost of isolating MUL is about 4% andno performance penalty

  18. Conclusions and Future Work Conditional communication in async. circuits is not free • Creates area and performance overheads • Requires manual or automatic optimization Asynchronous circuits can/should leverage sync. tools • This paper is first to use 3-valued-logic and observability don’t cares for power optimization of asynchronous circuits Our future work • Evaluate the proposed method on bigger designs • Adopt other sync power optimization techniques such as clock gating • Optimize the location of SEND/RECEIVE nodes (Reconditioning)

More Related