430 likes | 608 Views
A Cycle-Based Decomposition Method for Burst-Mode Asynchronous Controllers. Melinda Y. Agyekum Steven M. Nowick Department of Computer Science Columbia University {melinda, nowick}@cs.columbia.edu. Research Objective. Decomposed BM Controllers. Original BM Controller. 0. 7. 3. 1.
E N D
A Cycle-Based Decomposition Method for Burst-Mode Asynchronous Controllers Melinda Y. Agyekum Steven M. Nowick Department of Computer Science Columbia University {melinda, nowick}@cs.columbia.edu
Research Objective Decomposed BM Controllers Original BM Controller 0 7 3 1 1 1 4 5 2 7 2 1 3 5 4 0 Transform a Burst-Mode (BM) or Extended Burst-Mode (XBM) Decomposed BM & XBM controllers must • Collectively maintain the same behavior as the original • Individually adhere to all BM & XBM controller rules asynchronous controller into a set of decomposed controllers
Challenges & Motivation Decomposition • Technique used to divide a controller into smaller controllers • Synchronous Decomposition • A large amount of work in this area Challenges • Asynchronous more challenging than synchronous • No regular clock or discrete schedule system • Loosely coupled concurrent system • Limited work in this area Motivation • Our Primary Goal: • Improve runtime of CAD tool (esp. for larger controllers)
Challenges & Motivation (continued) Motivation • Our Secondary Goals: • Reduce next-state complexity • Decomposed controllers: much smaller next-state logic • Simplifies timing requirements • Narrows BM fundamental mode timing constraint • Potential reduction in power consumption • Only a single controller is active at a time • Control passed from controller to controller • Assists the designer • Alleviating manual decomposition • Providing a higher level of abstraction • Can write a single testbench for original BM controller • Apply it to the set of decomposed controllers
Contributions • Novel Method for Decomposition • For Burst-Mode: 4 major parts • Decomposition algorithm • Controller micro-architecture • Inter-controller communication protocol • Auxiliary hardware • Optimizations to eliminate or simplify hardware • Method for Extended Burst-Mode (XBM) • CAD Tool Implementation • For both BM & XBM
Contributions (continued) • Improved Synthesis Results • Runtime: 16-200x improvement • 1st time synthesis of some examples • Using a burst-mode synthesis tool (Minimalist/3D) • Combinational blocks of logic • For several decomposed controllers
Outline • Background • Review of Burst-Mode Controllers • Overview of Approach • Decomposition Method • Details of Hardware Implementation • Decomposed Controllers • Output Generators • XBM Extension • Experimental Results • Related Work and Conclusions
Background: Burst-Mode Specification 0 5 4 3 2 STATE INPUTS:AinBinCinDin OUTPUTS:YoutZout INPUTS OUTPUTS ARC Ain+ Zout+ TRANSITION PROPERTIES: 1) Non-empty input burst 2) Maximal Input Set 3) Unique Entry Point 1 Bin- Zout- Ain+ Bin+ Cin+ Bin+ Ain- | Zout+ Yout+ Bin- Cin- Yout- Din- Zout+ Yout- Din+ Zout-Yout+
Background: Burst-Mode Implementation Burst-Mode Controllers • A Huffman-Style asynchronous state machine • Consists of: • Primary inputs • Primary outputs • Fed-back state • State is stored in the fed-back loops i1 o1 Combinational Logic i2 o2 s1 s2 Delay
Background: Burst-Mode Implementation Two Simple One-Sided Timing Constraints • Fed-back State Requirement • Fed-back path must be slower than the worse case forward output path • Generalized Fundamental Mode Requirement • New inputs can not arrive until the entire machine has stabilized from the previous input burst • Hold-time requirement i1 o1 Combinational Logic i2 o2 s1 s2 Delay
Burst-Mode Applications BM Machines in Practice • Used in a large number of applications • Fabricated chips for: • Hewlett-Packard – Mayfly & Stetson projects • NASA Goddard Space Flight Center (2006-present) • Uses Minimalist & BM controllers for space instrumentation • First fabricated chip has just come back • Additional substantial real-world applications • Cache, Diff-eq Solver, DRAM- & SCSI-controllers Several of these projects perform manual decomposition for complex specifications
Outline • Background • Review of Burst-Mode Controllers • Overview of Approach • Decomposition Method • Details of Hardware Implementation • Decomposed Controllers • Output Generators • XBM Extension • Experimental Results • Related Work and Conclusions
Overview of Approach: Decomposition Method Example 0 7 3 4 5 0 3 5 4 4 2 2 2 6 Zout+ Ain+ Ain+ Zout+ 1 1 Zout- Bin- Zout- Bin- Bin- Zout- Ain+ ACKinA+ | REQ1- 1 Bin+ Bin+ Cin+ 2 Yout+ Ain-| Zout+ ACKinA- | REQ1+ Bin- Cin- Yout- Ain+ Bin+ Zout+ Yout- Ain-Zout+ Din- Zout-Yout+ Din+ Bin+ Cin+ Zout+ Yout- Zout-Yout+ ACKinD+ | Din- ACKinD- | Yout+ Din+ REQ2+ REQ2- Bin- Cin- Yout- Decomposed Specs. Top-Level Controller Original Monolithic Spec. Entry pt. REQ1+ Child Monitoring Arcs GOAL: Govern inter-controller communication & synchronization on channels ACK1b- Entry pt. ACK1b+ Req2+ REQ2- Entry pt. Entry pt. ACK1a+ ACK2- ACK2+ ACK1a-
Overview of Approach: Partial Micro-Architecture 1 1 3 7 2 1 6 0 BM Cntrl A (Parent) BM Cntrl C BM Cntrl B … REQ2+ (Parent) (Leaf) 3 Channel 2 Channel 1 BM Cntrl D (Leaf) 4 Specification Micro-Architecture Top-level Controller BM Cntrl A (Parent) … | REQ1+ ACKinA+ | REQ1- Implicit Connection ACKinA- | REQ1+ Add Communication Channel … | Ack1b- ACK1a … | ACK1a- ACK1b REQ1 BM Cntrl C (Parent) 0 BM Cntrl B (Leaf) … | ACK1a+ Ack1b+ Add Communication Channel REQ2 ACK2 ACKinC- | REQ2+ ACKinC+ | REQ2- … | ACK2- … | ACK2+ BM Cntrl D (Leaf)
Overview of Approach: Top-Level Communication Protocol BM Cntrl A (Parent) BM Cntrl C BM Cntrl B (Parent) (Leaf) Channel 1 Channel 2 BM Cntrl D (Leaf) Micro-Architecture Top-level Controller Parent sends REQ Parent Suspends Parent Active Parent de-asserts REQ, passes control to the child, and suspends The parent then becomes active and resumes control Top-Level controller is active At some point the child completes and de-asserts ACK The child is now active and continues The parent broadcasts a REQ to all of its children Only a single child responds with an ACK ACK1a ACK1b REQ1 Child is polled Child loses Child Disabled Child is polled Child wins Child Active Child Disabled REQ2 ACK2 Disabled
Complete System Micro-Architecture BM Cntrl A (Parent) BM Cntrl C BM Cntrl B (Parent) (Leaf) BM Cntrl D (Leaf) Decomposed Controllers Primary Output Generators Top-level Controller Primary Inputs ain bin A_to_zout ain A_to_zout Intermediate Outputs Signals Output Generator zout zout B_to_zout bin C_to_zout ACK1a ACK1b REQ1 cin Primary Outputs ain C_to_zout bin B_to_zout din bin C_to_yout cin C_to_yout Output Generator yout yout D_to_yout REQ2 ACK2 D_to_yout din
Outline • Background • Review of Burst-Mode Controllers • Overview of Approach • Decomposition Method • Details of Hardware Implementation • Decomposed Controllers • Output Generators • XBM Extension • Experimental Results • Related Work and Conclusions
Decomposition Method: Intuition on Approach Goal: determine where partitioning is possible Idea: start at root and traverse graph • “Region” = self-contained sub-graph • If region is “closed” • Cut & form a new partition • Continue hierarchical exploration • If region is “not closed” • Indicates multiple ways to exit region • Do not cut & continue exploring hierarchically
Decomposition Method: Formal View Main Idea: Identify and cut “closed regions” • Region = self-contained sub-graph • Reachable via “ancestor path” = simple path from root to a decision point • Region starts at a decision point • Includes a given outgoing arc • Contains all reachable states & arcs not previously visited • Closed = only a single point for entry and exit • Must enter and exit region through same point! Region closed Start Point Ancestor path Exit Region Decision pt Outgoing arcs Entry
Decomposition Method: Formal View Main Idea: Identify and cut “closed regions” • Region = self-contained sub-graph • Reachable via “ancestor path” = simple path from root to a decision point • Region starts at a decision point • Includes a given outgoing arc • Contains all reachable states & arcs not previously visited • Closed = only a single point for entry and exit • Must enter and exit region through same point! Region NOT closed Ancestor path Start Point Exit Decision pt Region Decision pt Outgoing arcs Entry
Decomposition Method: Example 3 2 1 4 Top-Level Segment Ancestor Path = Example with two decision points 0 Exit point Exit point Cut-point Cut-point Closed Region Closed Region Entry point Decision Point Can we cut here? Can we cut here?
Decomposition Method: Example 3 2 1 4 Top-Level segment = Ancestor Path Example with two decision points Do not cut! Hit ancestor decision pt. 0 Exit point Exit point Can we cut here? Cut-point Cut-point Entry point Decision Point Cut-point Closed Region Can we cut here?
Decomposition Method: Example 3 4 4 1 2 1 0 2 3 1 1 3 0 Example with a two decision points 4 Partitions Created 0 Entry Pt. Top-level Controller Entry Pt. Entry Pt. Decision Point Entry Pt. Uncut region
Decomposition Method: Algorithm Formal Algorithm • Graph-Based algorithm performs modified DFS • Forward Direction • Explores reachable regions • Only marks decision points (revisits non-decision points) • Backward Direction • Controller strands grown • Tests for “closed reachability” • When detected, cut strands (= create a new controller) Complete details of the formal algorithm are presented in the paper
Outline • Background • Review of Burst-Mode Controllers • Overview of Approach • Decomposition Method • Details of Hardware Implementation • Decomposed Controllers • Output Generators • XBM Extension • Experimental Results • Related Work and Conclusions
Details of Hardware Imp.: Decomposed Controllers REQ ACK Latch Enable Primary Input Filtered Input ACK BM Cntrl D Q Generic Input Latch Structure Decomposed Controller Activation Channel Core BM Cntrl Primary Input Latches • Transparent D-latches • Control when primary inputs can be received • By default all primary inputs are blocked Input Latch Enable: controlled by “activation channel” • Handles two scenarios: • “Controller as a child” = activated • “Controller as a parent” = activating
Details of Hardware Imp.: Decomposed Controllers 3 1 2 ACK “Controller as a Child” • Handles case when controller is activated Activation Channel Entry Pt. Latch Enable Unit REQ Ain+ | Zout+ Bin+ | Zout- Ain_i Ain D Q BM Specification Fragment Latch Structure for input Ain
Details of Hardware Imp.: Decomposed Controllers Parent’s REQ Parent’s ACK Enabling Unit Disabling Unit Child’s REQ Child’s ACK “Controller as a Parent” Idea: • “parent” gets latch disabled when control passed to child • latch re-enabled when child completes Generic Input Latch Structure Gate-Level Implementation Filtered Input Primary Input D Q
Details of Hardware Imp.: Output Generator Block View BM CntrlA Output Logic Can be XOR, XNOR, AND, OR, or a single wire CntrlA_To_Output BM CntrlB Output Logic CntrlB_To_Output Primary output Primary Output Generator Decomposed BM Controllers BM CntrlD Output Logic CntrlD_To_Output Output generator is determined by the initial output value ofthedecomposed controllers and theoriginal BM controller.
Outline • Background • Review of Burst-Mode Controllers • Overview of Approach • Decomposition Method • Details of Hardware Implementation • Decomposed Controllers • Output Generators • XBM Extension • Experimental Results • Related Work and Conclusions
Extended Burst-Mode (XBM) Extension 0 1 2 3 4 5 6 Decomposition method can also be applied to XBM XBM Background • More expressive form of BM controller • Supports 2 new features: • Directed Don’t Cares (DDCs) • Allow concurrent inputs and outputs • Conditionals • Permits level sampling of signals ok+Rin*/ FRout+ ok-Rin*/ -- FAin+ Rin*/ FRout- <Cnd->Rin-/ Aout- FAin-Rin+/ Aout+ XBM can handle glitchy inputs! <Cnd+> Rin-/ Aout- FRout+ Rin+ FAin-/ Aout+ Rin* FAin+/ FRout-
Extended Burst-Mode (XBM) Extension XBM Decomposition Method • Graph-based decomposition: • Uses same method as for BM • New: simple post-processing step • Remove/modify some XBM signals: locally mimic BM spec • Most signals remain unaffected
Outline • Background • Review of Burst-Mode Controllers • Overview • Decomposition Method • Details of Hardware Implementation • Decomposed Controllers • Output Generators • XBM Extension • Experimental Results • Related Work and Conclusions
Experimental Results Automated CAD Tool (bm-decomp) • Approx. 2100 lines C Code • Fully automated & implements all optimizations Benchmarks: From wide range of academic & industrial projects • Cutoff criteria used to focus on larger examples • BM examples > 12-71 states & up to 16 inputs/19 outputs • XBM examples > 9–28 states& up to 21 inputs/24 outputs BM Synthesis Flow • Uses Minimalist CAD framework [Fuhrer/Nowick] • Default runs: Used existing speed script up to 10 hours • With optimal state assignment • If run failed: Used command-line mode • With basic critical race free state assignment XBM Synthesis Flow • Uses 3D CAD tool [Yun/Dill] • Default runs: Used script given with tool • If run failed: No backup mode – 3D has only one mode
Experimental Results (Burst-Mode) 83.72 2.58 Less than 1 second to run on all examples 1 benchmark for 1 decomposed controller failed on optimal script 5 failed to complete optimal script after 10 hrs Over 200x runtime improvement 1 failed manual run Over 400x runtime improvement Produced simple combinational logic blocks for 7 out of 10 runs
Experimental Results (XBM) No implementation returned In some cases between 4-12x runtime improvement
Experimental Results: Input Optimizations Basic Goal: Remove or simplify input latches Two Techniques Reduction in Strength Complete Input Latch Removal 31%: Unlatched inputs 44%: 2-input gates 25%: Latched inputs
Experimental Results: Output Optimization Reduction in Strength • Basic Idea: XOR/XNOR can always be used • Goal: Replace with AND/OR or single wire (when possible) AND, OR, & single wire used 84% of the time
Outline • Background • Review of Burst-Mode Controllers • Overview of Approach • Decomposition Method • Details of Hardware Implementation • Decomposed Controllers • Output Generators • XBM Extension • Experimental Results • Related Work and Conclusions
Related Work System-Level Decomposition: asynchronous A large system is decomposed into datapath & control • Handshake circuits synthesis (Berkel92, Bardsley97) • Quasi-Delay insensitive (QDI) flow (Martin86, 90) • High-Level synthesis flow (Theobald01) • Differences: • Do not focus on individual controllers • Fairly coarse-grained • High-Level synthesis flow (Kudva96) • Control partitioned into sub-controllers • Limitations: Specification must follow a strict series-parallel structure Controller-Based Decomposition: sync and async • Synchronous: • Decomposition for low power (Benini98) • Differences:Partitions based on computational locality
Related Work (continued) Controller-Based Decomposition • Asynchronous: QDI Circuits • Net contraction (Chu87, Yoneda04) • Projects a Petri-net specification into smaller controllers • Source language-level decomposition technique (Kapoor04) • Introduce heuristics to resolve state coding conflicts • Direct mapping approach (Bystrov02) • Template-based mapping of places into David Cells • Differences: • Limited structure • Alternative methods for decomposing • Asynchronous: Burst-Mode Controllers (Beister99) • Output partitioning to translate a Petri-net into XBM controller • Variant of net contraction • Limitations: • A complicated basic method • No benchmark results reported
Conclusions Decomposition Approach • Decomposition technique for BM and XBM controllers • Main Idea:Partitions if a sub-region is “closed” • Inter-controller communication protocol • Additional hardware • Optimizations proposed to remove & reduce hardware • CAD tool developed • Significant improvements: • 16-200x greater runtime • 1st time synthesis of several larger examples