Digital Integrated Circuits A Design Perspective

Digital Integrated CircuitsA Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić Designing CombinationalLogic Circuits

Combinational vs. Sequential Logic Combinational Sequential Output = ( ) f In, Previous In Output = ( ) f In

At every point in time (except during the switching transients) each gate output is connected to either V or V via a low-resistive path. DD ss • The outputs of the gates assumeat all timesthevalue of the Boolean function, implemented by the circuit (ignoring, once again, the transient effects during switching periods). • This is in contrast to the dynamic circuit class, which relies on temporary storage of signal values on the capacitance of high impedance circuit nodes. Static CMOS Circuit

VDD In1 PMOS only (make a connection from Vdd to F when F(In1,…InN)=1 In2 PUN … InN F(In1,In2,…InN) In1 In2 PDN … NMOS only (make a connection from Gnd to F when F(In1,…InN)=0 InN Static Complementary MOS The PUN and PDN are structured in a mutually exclusive fashion such that only one of the them is conducting in steady-state

Static Complementary CMOS • Functionally, a transistor can be thought of as a switch. PDN is on when input signal is high and off when low. PUN is on when signal is low and off when high. • PDN network is constructed using NMOS while PUN using PMOS. The primary reason for this is that NMOS produce “strong zeros” while PMOS generates “strong ones”, why?

CL CL CL CL Threshold Drops VDD VDD PUN S D VDD D S 0  VDD 0  VDD - VTn VGS PDN VDD 0 VDD |VTp| VGS D S VDD S D That is why PMOS is used in PUN, and NMOS in PDN

NMOS Transistors in Series/Parallel Connection • Transistors can be thought of as a switch controlled by its gate signal • NMOS switch closes when switch control input is high NAND X: GND Y: output NOR

PMOS Transistors in Series/Parallel Connection NOR X: VDD Y: output NAND

Complementary CMOS Logic Style • Number of transistors required to implement an N-input logic gate is 2N

Example Gate: NAND

Example Gate: NOR

B A C D A D B C Complex CMOS Gate OUT = D + A • (B + C)

Constructing a Complex Gate

OAI22 Logic Graph A C XOR, XNOR ? B D X = (A+B)•(C+D) C D A B A B C D

Properties of Complementary CMOS Gates Snapshot • High noise margins : V and V are at V and GND , respectively. OH OL DD • No static power consumption : There never exists a direct path between V and DD V ( GND ) in steady-state mode . SS • Comparable rise and fall times: (under appropriate sizing conditions)

Complementary MOS Properties • Full rail-to-rail swing; high noise margins • Logic levels not dependent upon the relative device sizes; ratioless • Always a path to Vdd or Gnd in steady state; low output impedance • Extremely high input resistance; nearly zero steady-state input current • No direct path steady state between power and ground; no static power dissipation • Propagation delay as function of load capacitance and resistance of transistors

Voltage Transfer Characteristics • Multi-dimensional plot (can be obtained using DC sweep analysis)

Rp Rp Rp Rp Rp Rp A A A B B A Cint Rn CL CL CL Rn Rn Rn Rn B A B A A Cint Delay: Switch Delay Model Req A A NOR2 INV NAND2

Rp Rp B A Cint CL Rn A Input Pattern Effects on Delay • Delay is dependent on the pattern of inputs • Low to high transition • both inputs go low • delay is 0.69 Rp/2 CL • one input goes low • delay is 0.69 Rp CL • High to low transition • both inputs go high • delay is 0.69 2Rn CL Rn B

Delay Dependence on Input Patterns A=B=10 A=1 0, B=1 Voltage [V] A=1, B=10 time [ps] NMOS = 0.5m/0.25 m PMOS = 0.75m/0.25 m CL = 100 fF The difference between the later two cases of H-L has to do with the internal node capacitance charging

Rp Rp 1 1 Rp Rp A B B A Rn Cint Cint CL CL Rn Rn Rn B A A B 1 1 Transistor Sizing The goal is to size the gate so that it has approximately the same delay (mostly worst-case delay) as an minimum-size inverter (9λ/2λ,3λ/2λ) Assumes Rp = Rn 2 2 2 2 First-order estimate neglecting velocity saturation effects (smaller for stacked transistor) and self-loading

A D Transistor Sizing a Complex CMOS Gate Note: the number for PMOS is with respect to PMOS counterpart in minimum size inverter, and NMOS to NMOS counterpart B 3 (27λ/2λ) 3 C 3 3 OUT = D + A • (B + C) A 2 (6λ/2λ) D 1 B 2 C 2

D C B A C3 C2 C1 CL Fan-In Considerations A Distributed RC model (Elmore delay) tpHL = 0.69 Reqn(C1+2C2+3C3+4CL) Propagation delay deteriorates rapidly as a function of fan-in – quadratically in the worst case. B C D C1? C2? C3? CL? C1: CdbD, CsbC, 2CgdD, 2CgsC

The Elmore DelayRC Chain

quadratic tpHL linear tp as a Function of Fan-In for NAND Assumes fixed fan-out tp (psec) tp tpLH fan-in Gates with a fan-in greater than 4 should be avoided.

tp as a Function of Fan-Out All gates are scaled using the switched delay model. (why delay of NOR2 and NAND2 still lager?) Intrinsic cap. tpNAND2 tpNOR2 tp (psec) tpINV eff. fan-out

tp as a Function of Fan-In and Fan-Out • Fan-in: quadratic due to increasing resistance and capacitance • Fan-out: each additional fan-out gate adds two gate capacitances to CL tp = a1FI + a2FI2 + a3FO

C3 C2 C1 CL Fast Complex Gates:Design Technique 1 • Transistor sizing • as long as fan-out capacitance dominates • Progressive sizing (non-uniform sizing) Distributed RC line M1 > M2 > M3 > … > MN (the transistor closest to the output is the smallest) InN MN In3 M3 In2 M2 Can reduce delay by more than 20%; decreasing gains as technology shrinks (due to layout) In1 M1

Fast Complex Gates:Design Technique 2 • An input signal is called critical if it is the last signal of all inputs to assume a stable value • The path through the logic which determines the ultimate speed of the structure is called the critical path • Putting the critical path transistors closer to the output of the gate can result in a speed up

critical path critical path 01 charged charged 1 In1 In3 M3 M3 C2 C2 C1 C1 CL CL 1 1 In2 In2 M2 discharged M2 charged 1 In3 discharged In1 charged M1 M1 01 delay determined by time to discharge CL, C1 and C2 delay determined by time to discharge CL Fast Complex Gates:Design Technique 2 • Transistor ordering

Fast Complex Gates:Design Technique 3 • Alternative logic structures F = ABCDEFGH

CL CL Fast Complex Gates:Design Technique 4 • Isolating fan-in from fan-out using buffer insertion (inverter chains)

Sizing Logic Paths for Speed • Frequently, input capacitance of a logic path is constrained • Logic also has to drive some capacitance • Example: ALU load in an Intel’s microprocessor is 0.5pF • How do we size the ALU datapath to achieve maximum speed? • We have already solved this for the inverter chain – can we generalize it for any type of logic?

Buffer Example In Out CL 1 2 N For given N: Ci+1/Ci = Ci /Ci-1 To find N: Ci+1/Ci ~ 4 How to generalize this to any logic path?

Apply to Inverter Chain In Out CL 1 2 N tp = tp1 + tp2 + …+ tpN

Optimal Tapering for Given N • Delay equation has N - 1 unknowns, Cgin,2 – Cgin,N • Minimize the delay, find N - 1 partial derivatives • Result: Cgin,j+1/Cgin,j = Cgin,j /Cgin,j-1 • Size of each stage is the geometric mean of two neighbors • each stage has the same effective fanout (Cout/Cin) • each stage has the same delay

Optimum Delay and Number of Stages When each stage is sized by f and has same eff. fanout f: Effective fanout of each stage: Minimum path delay

Generalized logic path How to size this generalized logic path?

Logical Effort Logical effort is the ratio of input capacitance of a gate to the input capacitance of an minimum-size inverter gate with the same output current (considering worst case) g = 5/3 g = 4/3 g = 1

Normalizing the delay to inverter Now, consider to normalize everything to an minimum-size inverter (Assume g = 1). Then, (1) What is the load capacitance in each case for an effective fan-out of 2 (with respect to one input) for example? (2) What is the intrinsic delay in each case? (3) what is the external delay? 6 8 10

Delay in a logic gate with minimum size • So, if we normalize everything to an minimum-size inverter with ginv =1, pinv = 1 (everything is measured in unit delays tinv • And assume g = 1 • Then we can introduce • p – intrinsic delay factor • g – logical effort • f – effective fanout

Logical effort Gate delay: d = h + p effort delay intrinsic delay Effort delay: h = g f logical effort effective fanout (of each stage) = Cout/Cin Effective fanout (electrical effort) is a function of load/gate size

Logical Effort • Inverter has the smallest logical effort and intrinsic delay of all static CMOS gates • Logical effort represents the fact that for a given load, complex gate has to work harder (in terms of transistor sizes) than an inverter to get a similar delay). • In another way, complex gives more loading capacitance to the previous gate when made comparable to inverter after sizing • How much harder? How to measure it? • Logical effort for a complex gate can be computed from the ratio of its input capacitance to the inverter capacitance when sized to deliver the same current (Logical effort is a function of topology, independent of sizing) • Logical effort increases with the gate complexity

Logical Effort Reference: Sutherland, Sproull, Harris, “Logical Effort, Morgan-Kaufmann, 1999.

Logical Effort of Gates t pNAND g = p = d = t pINV Normalized delay (d) g = p = d = F(Fan-in) 1 2 3 4 5 6 7 Fan-out (h)

Logical Effort of Gates t pNAND g = 4/3 p = 2 d = (4/3)f+2 t pINV Normalized delay (d) g = 1 p = 1 d = f+1 F(Fan-in) 1 2 3 4 5 6 7 Fan-out (h) Intrinsic delay is increased by twice since the intrinsic capacitance gets two times larger

Logical Effort of Gates

Digital Integrated Circuits A Design Perspective