200 likes | 218 Views
Understand the physical limits of adiabatic computing using CMOS technology. Learn about various logic structures and timing considerations for efficient circuit design.
E N D
Physical Limits of ComputingDr. Mike Frank CIS 6930, Sec. #3753XSpring 2002 Lecture #24Adiabatic CMOS cont.Wed., Mar. 13
Administrivia & Overview • Don’t forget to keep up with homework! • We are 8 out of 14 weeks into the course. • You should have earned ~57 points by now. • Course outline: • Part I&II, Background, Fundamental Limits - done • Part III, Future of Semiconductor Technology - done • Part IV, Potential Future Computing Technologies - done • Part V, Classical Reversible Computing • Fundamentals of Adiabatic Processes & logic - last Wed. & Fri.(----------------------- Spring Break ------------------------) • Adiabatic electronics & CMOS logic families, - Mon. & TODAY • Limits of adiabatics: Leakage and clock/power supplies. TODAY • RevComp theory I: Emulating Irreversible Machines - Fri. 3/15 • RevComp theory II: Bounds on Space-Time Overheads - Mon. 3/18 • (plus ~7 more lectures…) • Part VI, Quantum Computing • Part VII, Cosmological Limits, Wrap-Up
Adiabatic computing in CMOS Monday: Adiabatic switching, split-level retractile & pipelined logic. Today: 2-Level Adiabatic Logic, general adiabatic logic
Some Timing Terminology For sequential adiabatic circuits: • Tick: Time for a single ramp transition • adiabatic speed fraction f times the RC gate delay. • Phase: Latency for a data value to propagate forward by 1 pipeline stage. • Cycle: Minimum period for all timing information to return to its initial state. • Diadic: Two retractile levels per gate • permits inverting or non-inverting logic. • Dual rail: Two wires per logic value • permits universal logic with monodic gates Monadic:only 1 level
Some Figures of Demerit • Some quantities we may wish to minimize: • Ticks/phase: • proportional to logic propagation latency • Ticks/cycle: • reciprocal to rate of data throughput • Transistor-ticks/cycle: • reciprocal to HW cost-efficiency • Number of required clock/power input signals: • supplying these may be a significant component of system cost • Number of distinct voltage levels required: • may affect reliability/power tradeoff
Some Interesting Questions • About pipelined, sequential, fully-adiabatic CMOS logic: • Q: Does it require an intermediate voltage level? • A: No, you can get by with only 2 different levels. • Q: What is the minimum number of externally provided timing signals you can get away with? • A: 4 (12 if split levels are used) • Q: Can the order-N different timing signals needed for long retractile cascades be internally generated within an adiabatic circuit? • A: Yes, but not statically, unless N2 hardware is used • where N is the number of stages per full sequential cycle • We now demonstrate these answers.
Some Timing Examples See next slide for some detailed timing diagrams. • N-level retractile cascades: • 2N ticks/phase × 1 phase/cycle = 2N ticks/cycle • 3-phase fully-static diadic SCRL • 8 ticks/phase × 3 phases/cycle = 24 ticks/cycle • 2-phase fully-static monadic SCRL • 5 ticks/phase × 2 phases/cycle = 10 ticks/cycle • 2-phase fully-static diadic SCRL • 6 ticks/phase × 2 phases/cycle = 12 ticks/cycle • 6 tick/cycle dynamic SCRL detailed previously: • 1 tick/phase × 6 phases/cycle = 6 ticks/cycle
P 2LAL: 2-level Adiabatic Logic P P • Dual-rail T-gate symbol: • Basic buffer element: • cross-coupled T-gates • Only 4 differenttiming signals,4 ticks per cycle: • i rises during tick i, falls during tick (i+2) mod 4 • 1 tick/phase × 4 phases/cycle = 4 ticks/cycle! • Optimizes latency & throughput per gate. B A B A : 1 in P out B 0 A Tick # 0 1 2 3 0 P 1 2 3
2LAL Cycle of Operation Tick number:0 1 2 3 11 in1 in0 10 out1 in 01 00 11 in=0 out0 out=0 01 00
Input-Barrier, Clocked-Bias Latching (1) Input conditionally lowers barrier (logic w. series/parallel barriers) (2) Clock applies bias force; conditional bit flip (3) Input removed, raising barrier & locking in state-change (4) Clock bias can retract. 1 1 1 2LAL is anexample ofthis. 0 0 0 Input pulse Pulse ends N 1 0
Shift Register Structure • 1-tick delay per logic stage: • Logic pulse timing & propagation: 2 3 4 1 in out 1 2 3 4 1 2 3 4 ... 1 2 3 4 ... in in
More complex logic functions • Non-inverting Boolean functions: • For inverting functions, must use quad-rail logic encoding: • To invert, justswap the rails! • Zero-transistor“inverters.” A B A A B AB AB A = 0 A = 1 A0 A0 A1 A1
Hardware Efficiency issues • Hardware efficiency: How many logic operations per unit hardware per unit time? • Hardware spacetime complexity: How much hardware for how much time per logic op? • We’re interested in minimizing:(# of transistors) × (# of ticks) / (gate cycle) • SCRL inverter, w. return path: • (8 transistors) (6 ticks) = 48 transistor-ticks • Quad-rail 2LAL buffer stage: • (16 transistors) (4 ticks) = 64 transistor-ticks
More SCRL vs. 2LAL • SCRL reversible NAND, w. all inverters: • (23 transistors) (6 ticks) = 138 T-ticks • Quad-rail 2LAL AND: • (48 transistors) (4 ticks) = 192 T-ticks • Result of comparison: Although 2LAL minimizes # of rails, and # ticks/cycle, it does not minimize overall spacetime complexity. • The question of whether 6-tick SCRL really minimizes per-op spacetime complexity among pipelined fully-adiabatic CMOS logics is still open. • An opportunity for you to make a contribution!
Minimizing Power-Clock Signals • How many external clock signals required? • N-level-deep retractile cascade logic: • 2N waveforms × 1 phase = 2N signals • 6 tick/cycle, 6-phase dynamic SCRL: • 6 waveforms × 6 phases = 36 signals • 24 tick/cycle, 3-phase static SCRL: • 12 waveforms × 3 phases = 36 signals • 4 tick/cycle, 2LAL: • 1 waveform × 4 phases = 4 signals! • It turns out that 12 signals are sufficient to implement any combination of 2-level or 3-level logics (including retractile) on-chip!
How to Do It • Circular 2LAL shifter; pulse-gated clocks Tick # 0 1 2 3 P0 P1 P2 P3 P0 P1 in 0 P2 P3 out P0 P1 P2 P3 0 2 2 1 2 3 2
12-rail system: pros & cons • Pros: • Completely solves adiabatic timing design problem • Enables mixtures of retractile, SCRL, and other logic styles on 1 chip • Enables simple fully-adiabatic SRAM & DRAM • Cons: • Timing signals are dynamic • Known fully-static alternatives use order N2 gates and signals for N-tick-long cycles • N can be large in a chip that includes deep retractile networks • Energy waste in driving the source/drain junction capacitances of all the T-gates even when timing pulse isn’t present (SOI reduces these parasitics)
Fully-Adiabatic DRAM cell • 6T, 6 lines/row, 1 line/column (in/out together) • Read cycle: • Initially: lines neutral, out neutral, R off • R for desired row turns on • for desired row splits, driving out column • R turns off, out is read • merges, out is reset • Write cycle: • First, do read cycle. • in is set to out • W turns on • in changed to new value...
Fully-Adiabatic SRAM • 10-T, 10 lines/row, 1 line/column • Operation similar to DRAM, except: • Read-out: T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off • Write: T2 off; N2 retracts; N1 retracts, copy of M presented on input; T1 on; inchanges; T1 off, N1asserts; N2 asserts; T2 on N1 N2 M T1 T2 T3 out in