1.06k likes | 1.48k Views
Reversible Computing Theory I: Reversible Logic Models. Reversible Logic Models. It is useful to have a logic-level (Boolean) model of adiabatic circuits. Can model all logic using maximally pipelined logic elements, which consume their inputs. A.k.a., “input consuming” gates.
E N D
Reversible Logic Models • It is useful to have a logic-level (Boolean) model of adiabatic circuits. • Can model all logic using maximallypipelined logic elements, which consume their inputs. • A.k.a., “input consuming” gates. • Warning: In such models Memory requires recirculation! Thus, this is not necessarily more energy-efficient in practice for all problems than retractile (non-input consuming) approaches! • There is a need for more flexible logic models. • If inputs are consumed, then the inputoutput logic function must be invertible.
Input-consuming inverter: in out Before: After:inoutinout0 - - 1 1 - - 0 E.g. SCRL implementation: Input arrow indicates inputdata is consumed by element. Alternate symbol: in out Invertible! (Symmetric)
An Irreversible Consuming Gate • Input-consuming NAND gate: Before: After:ABoutABout 0 0 - - - 1 0 1 - 1 0 - - - 0 1 1 - • Because it’s irreversible, it has no implementation in SCRL (or any fully adiabatic style) as a stand-alone, pipelined logic element! A out B 4 possible inputs, 2possible outputs. At least 2 of the 4 possibleinput cases must lead todissipation!
NAND w. 1 input copied? • Still not invertible: Before AfterABA’outABA’out 0 0 - - - - 0 1 0 1 - - - - 1 1 1 0 - - - - 1 0 1 1 - - • At least 1 of the 2 transitions to the A’=0, out=1 final state must involve energy dissipation of order kBT. How much, exactly? See exercise. Delay buffer A’ A out B
NAND w. 2 inputs copied? • Finally, invertible! Before:After:ABA’B’outABA’B’out 0 0 - - - - - 0 0 1 0 1 - - - - - 0 1 1 1 0 - - - - - 1 0 1 1 1 - - - - - 1 1 0 • Any function can be made invertible by simply preserving copies of all inputs in extra outputs. • Note: Not all output combinations here are legal! • Note there are more outputs than inputs. • We call this an expanding operation. • But, copied inputs can be shared by many gates. A’ A out B B’
SCRL Pipelined NAND A B A out = AB 5T B Inverters only neededto restore A, B— Can be shared withother gates that takeA, B as inputs. • Including inverters: 23 transistors • Not including inverters: 7 transistors
Non-Expanding Gates • Controlled-NOT (CNOT) or input-consuming XOR:ABA’C0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 0 • Not universal for classical reversible computing. (Even together w. all other 1 & 2 output rev. gates.) • However, if we add 1-input, 1-output quantumgates, the resulting gate set is universal! • More on quantum computing in a couple of weeks. A A’ A A’ B C = AB B C = AB Can implement w. a diadic gate in SCRL
Toffoli Gate (CCNOT) A A’ ABCA’B’C’0 0 0 0 0 00 0 1 0 0 10 1 0 0 1 00 1 1 0 1 11 0 0 1 0 01 0 1 1 0 11 1 0 1 1 01 1 1 1 1 1 • Subsumes AND, NAND, XOR, NOT, FAN-OUT, … • Note that this gate is its own inverse. • Our first universal reversible gate! A A’=A B B’=B B B’ C C’ C’ = CAB C (XOR)
Fredkin Gate • The first universal reversible logic gate to be discovered. (Ed Fredkin, mid 70’s) • B and C are swapped ifA=1, else passed unchanged. • Is also conservative, conserves 1s and 0s. • Thus in theory requires no separate power input, even if 1 and 0 have different energy levels! ABCA’B’C’0 0 0 0 0 00 0 1 0 0 10 1 0 0 1 00 1 1 0 1 11 0 0 1 0 01 0 1 1 0 11 1 0 1 1 01 1 1 1 1 1 A A’ B B’ C C’
Reversible Computing Theory II:Emulating Irreversible Machines
Motivation for this study • We want to know how to carry out any arbitrary computation in a way that is reversible to an arbitrarily high degree. • Up to limits set by leakage, power supply, etc. • We want to do this as efficiently as possible: • Using as few “device ticks” as possible (spacetime) • Minimizes HW cost, & leakage losses • Using as few adiabatic transitions as possible (ops) • Minimizes frictional losses • But, a desired computation may be originally specified in terms of irreversible primitives.
General-Case vs. Special-Case • We’d like to know two kinds of things: • For arbitrary general-purpose computations, • How to automatically emulate them in a fairly efficient reversible way, • w/o needing new intelligent/creative design work in each case? • Topic of today’s lecture • For various specific computations of interest, • What are the most efficient reversible algorithms? • Or at least, the most efficient that we can find? • Note: These may not necessarily look anything like the most efficient irreversible algorithms! • More on this point later
The Landauer embedding • The obvious embedding of irreversible ops into “expanding” reversible ones leads to a linear increase in space through time. (Landauer ‘61) • Or, increase in width of an input-consuming circuit “Expanding”operations(e.g., AND) Desiredoutput “Garbage”bits input Circuit depth, or time
Lecerf Reversal • Lecerf (‘63) was interested in the group-theory question of whether an iterated permutation of items would eventually return to initial item. • Proved undecidable by reducing Turing’s halting problem to this question, w. a reversible TM. • Reversible TM reverses direction instead of halting. • Returns to initial state iff irreversible TM would halt. • Only problem with this:No useful output data! Desiredoutput f f 1 Garbage Copy ofInput Input
The Bennett Trick • Bennett (‘73) pointed out that you could simply fan-out (reversibly copy) the desired output before reversing. • Note: O(T)storage is still temporarily needed! Desired output f f 1 Copy ofInput Input Garbage
Improving Spacetime Efficiency • Bennett ‘73 transforms a computation taking spacetime S·T to one taking (S·T2) spacetime in the worst case. • Can we do better? • Bennett ‘89: Described a technique that takes spacetime • Actually, can generalize slightly and arrange for exponent on T to be 1+, where 0 (very slowly) • Lange, McKenzie, Tapp ‘97: Space (S) is even possible, if you use time (exp((S))) • Not any more spacetime-efficient than Bennett.
Reversible “Climbing Game” • Suppose a guy armed with a hammer, N spikes, & a rope is trying to climb acliff, while obeying the following rules. • Question: How high can he climb? • Rules: • Standing on the ground or on a spike, he caninsert & remove a spike 1 meter higher up. • He can raise & lower himself betweenspikes & the ground using his rope. • He can’t insert or remove a spike whiledangling from a higher spike! • Maybe not enough leverage/stability?
Analogy w. Emulation Problem • Height on cliff represents: • How many steps of progress havewe made through the irreversiblecomputation? • Number of spikes represents: • Available memory of reversible machine. • Spike in cliff at height H represents: • Using a unit of memory to record the state of the irreversible machine after H steps. • Adding/removing a spike at height H+1if there is a spike is at height H represents: • Computing/uncomputing state at H+1 steps given state at H.
1. Insert spike @ 1. 2. Insert spike @ 2. 3. Remove spike @ 1. 4. Insert spike @ 3. 5. Insert spike @ 4. 6. Remove spike @ 3. 7. Insert spike @ 1. 8. Remove spike @ 2. 9. Remove spike @ 1. 10. Can use remaining 3 spikes to climb up another 4 if desired! Let’s Climb! 0. Standing on ground.
How high can we climb? • Using only N spikes, and the strategy illustrated, we can climb to height 2N1 (wow!) • Li & Vitanyi: (Theorem) This is the optimal strategy for this game. • Open question: • Are there more efficient general reversiblization techniques that are not based on this game model?
Triangle representation k = 2n = 3 k = 3n = 2
Analysis of Bennett Algorithm • n = # of recursive levels of algorithm • k = # of lower-level iterations to go forward 1 higher-level step • Tr = # of reversible lowest-level steps executed = 2(2k1)n • Ti = # of irreversible steps emulated = kn • So, n = logkTi, and so Tr = 2(2k1)log Ti/log k = 2elog(2k1)log(Ti)/log k = 2Tilog(2k 1)/log k (n+1 spikes)
Linear-Space Emulation (Lange, McKenzie, Tapp ‘97) Unfortunately, the tree may have 2S nodes!
Can we do better? • Bennett ‘73 takes order-T time, LMT ‘97 takes order-S space. • Can some technique achieve both, simultaneously? • Theorem: (Frank & Ammer ‘97) The problem of iterating a black-box function cannot be done in time T & space S on a reversible machine. • Proof really does cover all possible algorithms! • The paper also proves loose lower bounds on the extra space required by a linear-time simulation. • Results might also be extended to the problem of iterating a cryptographic one-way function. • It’s not yet clear if this can be made to work.
One-Way Functions • …are invertible functions f such that f is easy to compute (e.g., takes polynomial time) but f 1 is hard to compute (e.g., takes exponential time). • A simple example: • Consider: f(p,q) = pq with p,q prime. • Multiplication of integers is easy. • Factoring is hard (except using quantum computers). • The “one-way-ness” of this function is essential to the security of the RSA public-key cryptosystem. • No function has yet been proven to be one-way. • However, certain kinds of one-way functions are known to exist if P NP.
Elements of Frank-Ammer Proof • Consider a chain of bit-strings (size S each) that is incompressible by a certain compressor. • This is easily proven to exist. (See next slide.) • Machine’s job is to follow this chain from one node to the next by using a black-box function. • The compressor can run a reversible machine backwards, to reconstruct earlier nodes in the chain from later machine configurations. • If the reversible machine only uses order-S space in its configurations, then the chain is compressible! • Contradicts choice of incompressible chain; QED.
Existence of Incompressibles • A decompressor or description systems:{0,1}* maps any bit-string descriptiond to the described string x. • Notation f:D means a unary operator on D, f:DD • x is compressible is s iff d: s(d)=x, |d|<|x| • Notation |b| means the length of bit-string b in bits. • Theorem:Every decompressor has an incompressible input of any given length . • Proof: There are 2 length- bit-strings, but only shorter descriptions. ■
Cost-Efficiency Analysis Cost EfficiencyCost Measures in ComputingGeneralized Amdahl’s Law
Cost-Efficiency • Cost-efficiency of anything is %$ = $min/$, • The fraction of actual cost $that really needed to be spent to get the thing, using the best poss. method. • Measures the relative number of instances of the thing that can be accomplished per unit cost, • compared to the maximum number possible • Inversely proportional to cost $. • Maximizing %$ means minimizing $. • Regardless of what $min actually is. • In computing, the “thing” is a computational task that we wish to carry out.
Components of Cost • The cost $ of a computation may generally be a sum of terms for many different components: • Time-proportional (or related) costs: • Cost to user of having to wait for results • E.g., missing deadlines, incurring penalties. • May increase nonlinearly with time for long times. • Spacetime-proportional (or related) costs: • Cost of raw physical spacetime occupied by computation. • Cost to rent the space. • Cost of hardware (amortized over its lifetime) • Cost of raw mass-energy, particles, atoms. • Cost of materials, parts. • Cost of assembly. • Cost of parts/labor for operation & maintenance. • Cost of SW licenses
More cost components • Continued... • Area-time proportional (or related) costs: • Cost to rent a portion of an enclosing convex hull for getting things in & out of the system • Energy, heat, information, people, materials, entropy. • Some examples incurring area-time proportional costs: • Chip area, power level, cooling capacity, I/O bandwidth, desktop footprint, floor space, real estate, planetary surface • Note that area-time costs also scale with the maximum number of items that can be sent/received. • Energy expenditure proportional (or related) costs: • Cost of raw free energy expenditure (entropy generation). • Cost of energy-delivery system. (Amortized.) • Cost of cooling system. (Amortized.)
General Cost Measures • The most comprehensive cost measure includes terms for all of these potential kinds of costs. $comprehensive = $Time + $SpaceTime + $AreaTime + $FreeEnergy • $Time is an non-decreasing function f(tstartend) • Simple model: $Time tstartend • $FreeEnergy is most generally • Simple model: $FreeEnergy Sgenerated • $SpaceTime and $AreaTime are most generally: • Simple model: • $SpaceTime Space Time • $AreaTime Area Time Max # ops thatcould be done Max # items thatcould be I/O’d
Generalized Amdahl’s Law • Given any cost that is a sum of components,$tot = $1 + … + $n, • There are diminishing proportional returns to be gained from reducing any single cost component (or subset of components) to much less than the sum of the remaining components. • ∴ Design-optimization effort should concentrate on those cost components that dominate total cost for the application of interest. • At a “design equilibrium,” all cost components will be roughly equal (unless externally driven)
Reversible vs. Irreversible • Want to compare their cost-efficiency under various cost measures: • Time • Entropy • Area-time • Spacetime • Note that space (volume, mass, etc.) by itself as a cost measure is only significant if either: • (a) The computer isn’t reusable, & so the cost to build it dominates operating costs, or • (b) I/O latency V1/3 affects other costs. Or, for some applications,one quantity might be minimizedwhile another one (space, time, area)is constrained by some hard limit.
Time Cost Comparison • For computations with unlimited power/cooling capacity, and no communication requirements: • Reversible is worse than irreversible by a factor of ~s>1(adiabatic slowdown factor), times maybe a small constant depending on the logic style used.$r,Time $i,Time · s
Time Cost Comparison, cont. • For parallelizable, power-limited applications: • With nonzero leakage:$r,Time $i,Time / Ron/offg • Worst-case computations: g 0.4 • Best-case computations: g = 0.5. • For parallelizable, area-limited, entropy-flux-limited, best-case applications: • with leakage 0:$r,Time $i,Time / d 1/2 • where d is system’s physical diameter. • (see transparency)
Time cost comparison, cont. • For entropy-flux limited, parallel, heavily communication-limited, best case applications: • with leakage approaching 0:$r,Time $i,Time3/4 • where $i,Time scales up with the space requirement V as $i,Time V2/9 • so the reversible speedup scales with the 1/18 power of system size. • not super-impressive! (details later)
Bennett 89 alg. is not optimal k = 2n = 3 k = 3n = 2 Just look at all the spacetime it wastes!!!
Parallel “Frank02” algorithm • We can simply scrunch the triangles closer together to eliminate the wasted spacetime! • Resulting algorithm is linear time for all n and k and dominates Ben89 for time, spacetime, & energy! k=3n=2 k=2n=3 Emulated time k=4n=1 Real time
Setup for Analysis • For energy-dominated limit, • let cost “$” equal energy. • c$ = energy coefficient, r$ = r$(min) = leakage power • $i = energy dissipation per irreversible state-change • Let the on/off ratio Ron/off = r$(max)/r$(min) = Pmax/Pmin. • Note thatc$ $i·tmin = $i ·($i / r$(max)), sor$(max) $i2/c$ • SoRon/off $i2 / c$r$(min) = $i2 / c$r$
Time Taken • There are n levels of recursion. • Each multiplies the width of the base of the triangle by k. • Lowest-level triangles take time c·top. • Total time is thus c·top·kn. k=4n=1 Width 4 sub-units
Number of Adiabatic Ops • Each triangle contains k + (k 1) = 2k 1immediate sub-triangles. • There are n levels of recursion. • Thus number of adiabatic ops is c·(2k 1)n k=3n=2 52 = 25little triangles(adiabaticoperations)
Spacetime Usage • Each triangle includes the spacetime usage of all k 1 of its subtriangles, • Plus,additional spacetime units, each consisting of 1 storage unit, for time top·kn1 k=5n=1 1 state of irrev. mach. Being stored 1 2 Time topkn-1 3 Resulting recurrence relation:ST(k,0) = 1 (or c)ST(k,n) = (2k1)·ST(k,n1) + (k23k+2)·kn1/2 1+2+3 units
Reversible Cost • Adiabatic cost plus spacetime cost:$r = $a + $r = (2k-1)n·c$/t + ST(k,n)·r$t • Minimizing over t gives:$r = 2[(2k-1)n ·ST(k,n) ·c$ r$]1/2 • But, in energy-dominated limit, c$ r$ $i2 / Ron/off, • So:$r = 2$i ·[(2k-1)n ·ST(k,n) / Ron/off]1/2
Tot. Cost, Orig. Cost, Advantage • Total cost $i for irreversible operation performed at end of algorithm, plus reversible cost, gives:$tot = $i ·{1 + 2[(2k-1)n ·ST(k,n) / Ron/off]1/2} • Original irreversible machine performing knops would use cost $orig = $i·kn, so, • Advantage ratio between reversible & irreversible cost,
Optimization Algorithm • For any given value on Ron/off, • Scan the possible values of n (up to some limit), • For each of those, scan the possible values of k, • Until the maximum R$(i/r) for that n is found • (the function only has a single local maximum) • And return the max R$(i/r) over all n tried.
Spacetime blowup Energy saved k n
Asymptotic Scaling • The potential energy savings factor scales as R$(i/r) Ron/off~0.4, • while the spacetime overhead goes only as R$(i/r) R$(i/r)~0.45, or Ron/off~0.18. • E.g., with an Ron/off of 109, you can do worst-case computation in an adiabatic circuit with: • An energy savings of up to a factor of 1,200× ! • But, this point is 700,000× less hardware-efficient, if Frank02 algorithm is used for the emulation.