310 likes | 441 Views
Emerging Technologies of Computation. Montek Singh COMP790-084 Oct 27, 2011. Today: Basics of Asynchronous Design. Introduction to Asynchronous Design What is asynchronous design? Why do we want to do it? Data Representation and Communication
E N D
Emerging Technologies of Computation Montek Singh COMP790-084 Oct 27, 2011
Today: Basics of Asynchronous Design • Introduction to Asynchronous Design • What is asynchronous design? • Why do we want to do it? • Data Representation and Communication • How is data represented in an asynchronous system? • How is information exchanged?
Introduction: Clocked Digital Design clock Most current digital systems are synchronous: • Clock:a global signal that paces operation of all components Benefit of clocking: enables discrete-time representation • all components operate exactly once per clock tick • component outputs need to be ready by next clock tick • allows “glitchy” or incorrect outputs between clock ticks
Microelectronics Trends Current and Future Trends: Significant Challenges • Large-Scale “Systems-on-a-Chip” (SoC) • 100 Million ~ 1 Billion transistors/chip • Very High Speeds • multiple GigaHertz clock rates • Explosive Growth in Consumer Electronics • demand for ever-increasing functionality … • … with very low power consumption (limited battery life) • Higher Portability/Modularity/Reusability • “plug ’n play” components, robust interfaces
Challenges to Clocked Design Breakdown of Single-Clock Paradigm: • Chip will be partitioned intomultiple timing domains • challenge: gluing together multiple timing domains • glue logic is susceptible to “metastability” (=incorrect values transferred) and latency overheads Increasing Difficulties with Clocked Design: • Clock distribution: requires significant designer effort • Performance bottleneck: a single slow component • Clock burns large fraction of chip power (~40-70%) • Fixed clock rate: poor match for • designing reusable components • interfacing with mixed-timing environments
What is Asynchronous Design? handshaking interface clock Synchronous System (Centralized Control) Asynchronous System (Distributed Control) • Digital design with no centralized clock • Synchronization using local “handshaking”
Why Asynchronous Design? (1) • Higher Performance • May obtain “average-case” operation (not “worst-case”) • not limited by slowest component • Avoids overheads of multi-GHz clock distribution • Lower Power • No clock power expended • Inactive components consume negligible power • Better Electromagnetic Compatibility • Smooth radiation spectra: no clock spikes • Much less interference with sensitive receivers [e.g., Philips pagers, smartcards] • Greater Flexibility/Modularity • Naturally adapt to variable-speed environments • Supports reusable components
Why Asynchronous Design? (2) • The world already is mostly asynchronous! • Events at the level of (or in between) large-scale systems are asynchronous • several seconds to several milliseconds • e.g., PC-printer communication, keyboard inputs, network comm. • Events at the board level (or between chips) are often asynchronous • milliseconds to 100 nanoseconds • e.g., CPU-memory interface, interface with I/O subsystem (interrupts) • Events within a chip, at the level of functional units (e.g., adders, control logic) are currently mostly synchronous • several nanoseconds to 100 picoseconds • Events at the level of a single logic gate are asynchronous • 10 picoseconds • Events at the quantum level are asynchronous • picoseconds to femtoseconds • So, why bother with clocks at all?! • make everything asynchronous greater elegance and robustness
Challenges of Asynchronous Design communication must be hazard-free! special design challenge =“hazard-free synthesis” Testability Issues: absence of clock means no “single-stepping” Lack of Commercial CAD Tools: chicken-and-egg problem clock tick no problemfor clockedsystems clean signals hazardous signals • Hazards: potential “glitches” on wire
Asynchronous Design: Past & Present Async Design: In existence for 50 years, but … … many recent technical advances: • Hazard-Free Circuit Design: • several practical techniques for controllers [Stanford/Columbia] • Design for Testability: • several test solutions, e.g. Philips Research • Maturing Computer-Aided-Design (“CAD”) Tools: • software tools for automated design [Philips,Columbia,Manchester] • recent DARPA program [Boeing,Philips,UNC,Columbia,…] • Successful Fabricated Chips: • embedded processors, high-speed pipelines, consumer electronics…
Recent Commercial Interest (1) Several commercial asynchronous chips: • Philips: asynchronous 80c51 microcontrollers • used in commercial pagers [1998] and smartcards [2001] • Univ. of Manchester: async ARM processor [2000] • Motorola: async divider in PowerPC chip [2000] • HAL: async floating-point divider • in HAL-I and II processors [early 1990’s] Recent experimental chips: • IBM, Sun and Intel: • fast pipelines, arbiters, instruction-length decoder… • IBM/Columbia/UNC: asynchronous digital FIR filter Several recent startups: • Handshake Solutions, Theseus Logic, Codetronix, Fulcrum, Silistix, …
Recent Commercial Interest (2) Major DARPA program: • ~$13M • Goals: • commercial-strength automated CAD tool (=silicon compiler) • direct translation from algorithms to chip layout • capable of producing chips with 50M transistors or more • rich suite of analysis and optimization tools • demonstration chip • Boeing application • show dramatic improvements in: design time, power consumption, noise pollution, speed (?) • Team: • led by Boeing • async startups: Theseus, Handshake Solutions, Codetronix • universities: UNC, Columbia, UW, OrSU
A 5-minute Homework Problem Alice Bob Alice and Bob live on opposite sides of a wide river: Aliceis supposed to send a message (say, a “Yes”/”No”) across to Bob around midnight. Both have flashlights, but neither owns a watch. What should they do? Suggest several strategies, and discuss pros and cons of each.
Solution 1 got it yes/no ready Aliceuses 2 lamps: • 1 to indicate that she is ready with the message, and • 1 for the message itself Bobuses 1 lamp: • to indicate that he has received the message Alice Bob
Solution 2 got it yes no Aliceuses 2 lamps: • Green lamp to indicate “yes” • Red lamp to indicate “no” Bobuses 1 lamp: • to indicate that he has received the message Alice Bob
Solution 3 What if Alice and Bob could keep time? Aliceuses 1 lamp for the message: • At 12 midnight: turns on lamp if message = “yes” • At 12:01: turns lamp off Bobneeds no lamps! • Takes down the message between 12 and 12:01 Pros: Fewer signals, lesser processing needed Cons: Alice and Bob must keep their clocks closely synchronized • If Bob’s watch is off by a minute, incorrect communication possible
Homework! • Think of all scenarios in which Solution #1 can fail • Are any of those scenarios a problem for Solution #2 as well?
Data Representation and Communication How is data represented in an asynchronous system? How is information exchanged?: control signaling (handshake styles)
Data Encoding: “Bundled Data” matched delay request done bit 1 bit 1 done indicates valid data bit n bit m functionblock Single-rail “Bundled Datapath”: simplest approach • widely used Features: • datapath: 1 wire per bit (e.g. standard sync blocks) • matched delay: produces delayed “done” signal • worst-case delay: longer than slowest path • Practical style: can reuse sync components; small area • Fixed (worst-case) completion time
Bundled Data: Completion Sensing request done MUX bank of delays delayselector Delay Matching: • either single worst-case delay • or, fine-grain delay Speculative completion: • choose delay “on the fly” • start with shortest delay; increase as needed
Data Encoding: Dual-Rail bit 1 bit 1 bit n bit m Dual-rail: uses 2 wires per data bit Each Dual-Rail Pair: provides both data value and validity • provides robust data-dependent completion • needs completion detectors
Dual-Rail: Completion Sensing bit0 bitn bit1 OR OR OR Done C Dual-Rail Completion Detector: • combines dual-rail signals • indicates when all bits are valid (or reset) C-element: • if all inputs=1, output 1 • if all inputs=0, output 0 • else, maintain output value • OR together 2 rails per bit • Merge results using a Müller “C-element”
Handshaking Styles: 4-phase get ready for next event start event Request ready for next event event done Acknowledge 4-Phase: requires 4 events per handshake • “Level-sensitive” simpler logic implementation • Overhead of “return-to-zero” (RTZ or resetting) • extra events which do no useful computation
Handshaking Styles: 2-phase start next event start event Request next event done event done Acknowledge 2-Phase: requires 2 events per handshake • a.k.a. transition signaling • Elegant: no return-to-zero • Slower logic implementation: • logic primitives are inherently level-sensitive, not event-based (at least in CMOS)
Handshaking Styles: Pulse Mode Pulse Mode: combines benefits of 2-phase and 4-phase • use pulses to represent events start next event start event Request next event done event done Acknowledge • No return-to-zero (like 2-phase) • Level-based implementation (like 4-phase) • Need a timing constraint on pulse width
Handshaking Styles: Single-Track req req Request req + ack Acknowledge ack ack Single-Track: combines req and ack onto single wire! • one wire used for bidirectional communication • sender raises, receiver lowers • Efficient protocol: no return-to-zero, level-based • Need aggressive low-level design techniques • much effort to ensure reliability, satisfy timing constraints
Handshaking + Data Representation bit 1 bit m ack Several combinations possible: • dual-rail 4-phase, single-rail 4-phase, dual-rail 2-phase, and single-rail 2-phase Example: dual-rail 4-phase • dual-rail data: functions as animplicit “request” • 4-phase cycle: between acknowledgeand implicit request A B
Other Data Representation Styles data phase • Level-Encoded Dual-Rail (LEDR) • 2 wires per bit: “data” and “phase” • exactly one wire per bit changes value • if new value is different, “data” wire changes value • else “phase” wire change value • M-of-N Codes • N wires used for a data word • M wires (M <= N) change value • Values of N and M: have impact on… • information transmitted, power consumed and logic complexity • Knuth codes, Huffman codes, …
Which to use? Depends on several performance parameters: • speed • single-rail vs. dual-rail • single-rail may be faster (if designed aggressively) • dual-rail may be faster (if completion times vary widely) • 2-phase vs. 4-phase • 2-phase may be faster (if logic overhead is small) • 4-phase may be faster (if overhead of return-to-zero is small) • power consumption • 2-phase typically has fewer gate transitions ( lower power) • amount of logic used (#gates/wires/pins chip area) • single-rail needs fewer gates/wires/pins • design and verification effort • dual-rail, 1-of-N, M-of-N, Knuth codes…: • delay-insensitive: robust in the presence of arbitrary delays • single-rail: requires greater timing verification effort
Homework! • Suppose you are given N wires • Which M-of-N encoding (i.e. what M) encodes most information? • Suppose you have to encode 4-bit values • Which M-of-N encoding yields fewest wires? • Suppose you can switch at most 2 wires • Which M-of-N encoding yields fewest wires for 4-bit values?