430 likes | 584 Views
Circuit Performance and Adders. Recap from last time Hardware Design is Complicated Because We Want Circuits to Go Fast Combinational Logic: Used A Simple Model of Delay Integer Delay on Each Gate Reduction of Circuit to Directed Acyclic Graph
E N D
Circuit Performance and Adders • Recap from last time • Hardware Design is Complicated Because We Want Circuits to Go Fast • Combinational Logic: Used A Simple Model of Delay • Integer Delay on Each Gate • Reduction of Circuit to Directed Acyclic Graph • Delay of Circuit (= Clock Period) is longest path in graph • Making Circuits Go Fast = Shortening Longest Path • Exploit Asymmetry between path lengths • Shorten Longest Path by • Introducing Redundant Logic • Moving Logic from Long to Short Paths • We will see a different technique today! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 1
Delay Model of a Circuit • Translate circuit into graph • Weights on nodes are delay through gates • Delay through circuit is longest path through graph • Easy, linear-time algorithm A 2 B 1 C 1 D CS 150 – Spring 2008 – Lec #15: Ckt Performance - 2
Circuit Performance Model Latches Latches Combinational Logic Inputs stabilize at 0 Logic finishes when last output stabilizes CS 150 – Spring 2008 – Lec #15: Ckt Performance - 3
Circuit Performance Model • Outputs of latches are stable only at clock edge • Inputs to latches must be stable by next clock edge • Time between clock edges must be > delay of combinational logic Latches Latches Combinational Logic CS 150 – Spring 2008 – Lec #15: Ckt Performance - 4
Adders • Highly-Studied Circuit, so case study in design • “Ripple-carry” adder: standard adder where carry ripples from one bit to another • Longest path for n-bit adder is O(n) • Number of gates for n-bit adder is O(n) • “Carry Lookahead”: Accelerate carry chain • Collapse carry into all bits • O(log n) delay (optimal!) • O(n^3) gates (terrible!) • Practical Compromise is block-accelerated adders • Block-carry lookahead • Carry-select adder CS 150 – Spring 2008 – Lec #15: Ckt Performance - 5
m-bit CLA adder m-bit CLA adder m-bit CLA adder GG GG GG PP PG PG Hierarchical Carry lookahead • PHG, GG used as propagate, generate inputs to hierarchical block Carry Lookahead Block PG1 GG1 PG2 GG2 PG0 GG0 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 6
Synopsis of Hierarchical Carry-Lookahead • n-bit adder, m-bit blocks, n/m blocks • Delay is 2 log n + 2 log m • Size is max(nm^2, (n/m)^3) • Best is m = n^2/5 • Delay is 14/5 log n, size is O(n^9/5) CS 150 – Spring 2008 – Lec #15: Ckt Performance - 7
Analysis of the Carry-Lookahead Adder • n bit adder, m-bit blocks, n/m blocks • Delay through the adder: 2 * delay through the lookahead block + delay through the super-lookahead block • Lookahead block 2 log m • Super-block: 2 log n/m = 2 log n – 2 log m • Total: 2 log n + 2 log m • Logic: scales like the lookahead blocks • Size p block: O(p^3) from before • Two size of blocks: n/m blocks of size m, one block of size n/m • Total: n/m * m^3 = nm^2, (n/m)^3 • Choose m to minimize max(nm^2,(n/m)^3) • Solution at m=n^(2/5).Total is n + n^3/5 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 8
Carry Select Adder • “Combinational Speculative Execution” • Basic intuition: • Adders spend time waiting to see what carry-in is • Therefore • Go ahead and guess each way • Pick the right answer when the carry comes by CS 150 – Spring 2008 – Lec #15: Ckt Performance - 9
Carry-Select adder • Each block is doubled • One block computes Carry-in=0, other carry-in=1 • Actual carry-in (carry-out from previous block) computes result • m sum bits • 1 carry-out bit 0 1 m-bit block m-bit block m-bit block m m 1 0 1 0 Block 0 m Block 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 10
Analysis of Carry-Select Adder • Delay analysis: Worst-case path is through Block0 then control of multiplexer chain • O(m) gates in Block0 • O(p = n/m) gates in multiplexer chain Blockp1 Blockp0 Block21 Block20 Block11 Block10 Block0 • Choose m to minimize max(n/m, m) • Minimum is to choose m= Ön CS 150 – Spring 2008 – Lec #15: Ckt Performance - 11
Twelve-bit Carry-Select Example • Problem: add -3 (0xffd, 111111111101) to 17 (0x011, 000000010001)) • Use 4-bit carry select blocks 1 d 0 f f 0 1 f f 1 1 0 1 0 0,f 0,0 0,0 0,1 0 e 0 0,0 0 Result is 0xe (14) CS 150 – Spring 2008 – Lec #15: Ckt Performance - 12
Hardware for the Carry Select Adder • Ön blocks, each of Ön gates • Additional hardware is Ön multiplexers + additional adder for each block but the first • n - Ön additional adder bits • Therefore Ön + 2n - Ön = 2n gates • Exactly twice the size of an ordinary adder, but delay is Ön instead of n CS 150 – Spring 2008 – Lec #15: Ckt Performance - 13
Carry-Bypass Adder • Like the carry-select adder, has O(Ön) delay • But even more efficient (in terms of gates) than the carry-select • Has only n + Ön log n gates • However, it broke every timing analyzer… • Instead of shortening the longest path, made it longer! • How can this be? Isn’t the delay of the circuit the length of the longest path?... CS 150 – Spring 2008 – Lec #15: Ckt Performance - 14
What is the delay of the Circuit? • The delay of a circuit is the time that the last output settles • This can be the length of the longest path, but sometimes isn’t • The longest path is an upper bound on the delay of the circuit, but sometimes this isn’t tight CS 150 – Spring 2008 – Lec #15: Ckt Performance - 15
Example • Long paths are from X,Y->out through bottom of circuit • But no signal can travel down these paths! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 16
1 1 1 Example 1 1 t=0 t=1 0 1 t=2 t=3 t=4 0 t=6 1 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 17
Timing Analysis Longest path is 8, but no signal ever travels down it! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 18
What happened? • Long Paths are false • A->B requires z=1 • B->C requires z=0 • Conflict! No signal can propagate down this path • This analysis doesn’t quite work • Analysis has to take into account delays • Complete theory not understood till 1993 • This is good enough for carry-bypass adder CS 150 – Spring 2008 – Lec #15: Ckt Performance - 19
Announcements • Prof. Pister will lecture on wireless protocol Thursday • Need this for your project • Spring Break • Tuesday 4/1 – TBD • Thursday 4/3 – MT review • Tuesday 4/8 – MT 2 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 20
False Paths and Adders • Key idea: Don’t make critical paths in adder short • Idea behind Carry Lookahead and Carry-Select adders • Instead, make long paths false • Critical Path is Through the Carry Chain • Only exercised when propagate bit through every block is set? • (Question: is this likely?) • Therefore: when signal would propagate through carry chain, skip the block! • Recall from block carry-lookahead adder: Group Propagate PG = P0P1P2P3 • When PG=1 have the carry skip the whole block! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 21
Carry-Skip Block Carry-in to next block m-bit ripple-carry adder Carry-in PG 0 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 22
Suppose Carry-in Propagates to Carry-Out… Carry-in to next block m-bit ripple-carry adder Carry-in PG 0 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 23
Then PG=1 Carry-in to next block m-bit ripple-carry adder Carry-in PG 0 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 24
So Path goes Through the 1-port of the MUX Carry-in to next block m-bit ripple-carry adder Carry-in PG 0 1 Delay is 1-MUX delay, not 4 propagate delays! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 25
0 0 0 1 1 1 Full Carry-Bypass Adder Block 0 Block n/m Block 1 Carry-in PG As before, n/m array of m-bit blocks CS 150 – Spring 2008 – Lec #15: Ckt Performance - 26
0 0 0 1 1 1 Full Carry-Bypass Adder: Worst-case path Block 0 Block n/m -1 Block 1 Carry-in PG Worst-case path goes through m-1 bits of block 0, n/m-2 1 gates of multiplexer, m-1 bits of block n/m -1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 27
Timing and Size Analysis • Delay = 2 * (m – 1) + n/m – 2 • Choose m to minimize delay => m= Ön • We have Delay = 2 * (Ön – 1) + Ön – 2 = 3 Ön – 4 • What’s the additional circuitry? • log m gates to build PG (1 per block) • 1 two-input multiplexer per block • n/m blocks • => n/m (log m + 1) • m = Ön => Ön (log n/2 + 1) • Same delay as carry-select, but much smaller (n + Ön) vs 2n CS 150 – Spring 2008 – Lec #15: Ckt Performance - 28
0 0 0 1 1 1 Full Carry-Bypass Adder: Longest path Block 0 Block n/m -1 Block 1 Carry-in PG Longest path goes through all blocks and all multiplexers: m * n/m + n/m CS 150 – Spring 2008 – Lec #15: Ckt Performance - 29
Longest Path vs Circuit Delay • Longest Path is n + Ön • Worst-case path is Ön • Worst-case path for ripple-carry is n • Made things better, but a timing analyzer thinks it’s worse! • Stimulated tremendous interest in timing analyzers! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 30
Adder Summary CS 150 – Spring 2008 – Lec #15: Ckt Performance - 31
A comment on n • Asymptotic results tell us what happens at infinity • For our purposes, n=16, 32, 64 • Means: square root n = 4 – 8 • Means: Log n = 4-6 • For the sizes we are interested in, carry-select and carry-bypass are as fast as block CLA CS 150 – Spring 2008 – Lec #15: Ckt Performance - 32
Remaining Questions (just for fun) • How often does worst-case delay path occur in Carry-bypass adder? • How do we automatically analyze for false paths? CS 150 – Spring 2008 – Lec #15: Ckt Performance - 33
ü ü How often does (near) worst-case delay occur? • Worst case delay: Pi = 1 for all i > j, small j • Pi=AiÅ Bi • How often is Pi=AiÅ Bi = 1? Only two of nine cases, but they happen frequently CS 150 – Spring 2008 – Lec #15: Ckt Performance - 34
How hard is it to analyze false paths? • Hard! • Problem noticed in early timing verifiers in the 1970’s • Early researchers (Hitchcock, Jouppi, Ousterhout) used hand-done rules • Often wrong (if it’s hard to analyze automatically, it’s hard to guess right by hand) • Next: “Static sensitization” • Assert “non-controlling’’ values on side inputs (0 for OR/NOR, 1 for AND/NAND) • Make sure assignments are consistent • Problem: Values are changing! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 35
Example • To sensitize a->d->f->g, note: a->d requires b=1 • But b=1 => e=0, and f->g requires b=1 • Similar argument says you can’t set b->d->f->f CS 150 – Spring 2008 – Lec #15: Ckt Performance - 36
But… Delay of the circuit is 3! Path a->d->f->g really was true CS 150 – Spring 2008 – Lec #15: Ckt Performance - 37
Key Problem • All inputs are changing… • a->d requires b=1 means b=1 stable at t=0 • But b changes to 0 at t=0 • Therefore, value of b is unknown (X) • Also, delays of gates are unknown • “1” really means [0,1] CS 150 – Spring 2008 – Lec #15: Ckt Performance - 38
Key Idea: Derive Function for each time d= 1 at 1 d = 0 at 1 d = X at 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 39
Key Idea: Derive Function for each time (d= 1 at 1) = (a=1 at 0) and (b = 1 at 0) CS 150 – Spring 2008 – Lec #15: Ckt Performance - 40
Key Idea: Derive Function for each time (d= 0 at 1) = (a=0 at 0) or (b = 0 at 0) CS 150 – Spring 2008 – Lec #15: Ckt Performance - 41
Key Idea: Derive Function for each time (d= 0 at 1) = (d=1 at 1) nor (d = 1 at 0) CS 150 – Spring 2008 – Lec #15: Ckt Performance - 42
Delay of the Circuit • Delay of the Circuit is the latest t such that (“output = X at t”) is not == 0 • Problem is NP-complete • Size of problem is linear in number of time slices x number of gates • Mathematical machinery fairly massive • “Special Theory”: 1989 – handled symmetric gates, zero-lower-bounded delays (all signals were X until they hit their final values) • Other cases were conservatively approximated • “General Theory”: 1993 – handled all gates, general delay models • Gave exact answers for all delay types • Still hasn’t quite reached industrial practice! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 43