A 240ps 64b Carry-Lookahead Adder in 90nm CMOS

A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri fmontazeri@ece.ut.ac.ir Advanced VLSI Course PresentationUniversity of Tehran December 2006 Based on : A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Sean Kao, Radu Zlatanovici, Borivoje Nikolić University of California, Berkeley

What Is an Optimal Adder? 64-bit Adders on IEEE Xplore 1995-2005 [1] Optimal adder: • Minimum delay for given energy • Minimum energy for given delay

This Work Multi-issue 64-bit microprocessor environment: • Optimize a set of representative 64-bit adders in the energy – delay space • Analyze the design tradeoffs • Implement the optimal adder in 1.0V 90nm GP CMOS

Outline • Energy – delay optimization • Design tradeoffs for 64-bit adders • Test chip implementation • Measured results • Summary

Energy – Delay Optimization Domino CLA Adder Energy Static CLA Adder [1] Delay • Goal: obtain the energy – delay optimal adder • CAD tool: optimize custom digital circuits in the energy – delay space [3]

Models Netlist Optimization Goal Variables Static timer (C++) Optimizer (Matlab) Static timer (C++) Delay, Energy Design Variables Optimization Core Optimal Design Circuit Optimization Framework [1]

Adder Optimization Setup Minimize DELAY subject to Maximum ENERGY [1]

CLA: Full Tree Comparison Radix-2 Radix-4 • 3 stages • Larger branching • 6 stages • Moderate branching [1] Radix- 4 closer to optimum number of stages

CLA vs. Ling Conventional CLA • Higher stack in first stage • Simple sum precompute Ling CLA • Lower stack in first stage • Complex sum precompute • Higher speed [2] [1]

Full vs. Sparse Comparison FULL Ling CLA SP2 [1]

Full vs. Sparse Comparison FULL Ling CLA SP4 [1] Sparseness benefits adders with large carry trees

Optimal Adder • Ling’s equations • Radix-4 sparse-2 • Domino carry tree • Static sum-precompute • Delay of fastest adder: 7.3 FO4 [1]

Radix-4 Sparse-2 Carry Tree [1] • Computes every other Ling pseudo-carry: H0, H2, H4 … • Each output selects two sums

Adder Core Block Diagram [1] • Critical paths implemented in clock-delayed domino • Non-critical paths implemented in static • At-speed BIST

Timing Diagram [1] • 20 ps margin on all edges; Adjustable hard edges • Delay spread places precharge in critical path

Layout Floorplan [1] • Bitslice height: 24 metal tracks • Aligned clock lines • Sum precompute occupies space freed by sparse carry tree

90 nm Test Chip • 90 nm GP 7M 1P • SVT transistors • VDD = 1V • 8 adder cores + test circuitry • Core 1: this work • Cores 2-8: Supply noise measurements and supply grid experiments [4]. • Adder core size: 417 x 75mm2 1.6 mm 1.7 mm [1]

[1]

Advance Program Digest Chip Packaging Chip-on-board: • Bond wires 60% shorter • Cleaner supply 10 ps shorter delays [1]

Measured Results: Delay CHIP-ON-BOARD: • VDD = 1 V • Average: 240 ps • Fastest: 226 ps • VDD = 1.3 V • Average: 180 ps [1] Davg = 7.5 FO4

Measured Results: Power Leakage Adder core BIST Clk gen [1] VDD = 1V: Pmax = 260 mW VDD = 1.3V: Pmax = 606 mW

Conclusion • 90 nm GP 7M 1P • SVT transistors • VDD = 1V • 8 adder cores + test circuitry • Adder core size: 417 x 75mm2

Ling radix-4 sparse-2 domino carry tree 90nm GP CMOS: 240ps, 260mW @1V Summary 64-bit Adders on IEEE Xplore 1995-2005 [1]

References • [1]. S. Kao, R. Zlatanovici, B. Nikolic, “A240ps 64-bit Carry-Lookahead Adder in 90nm CMOS,” ISSCC2006, Feb.2006. • [2]. H. Ling, “High Speed Binary Adder,” IBM J. R&D, vol. 25, no. 3, pp.156-166, May, 1981. • [3]. R. Zlatanovici, B. Nikolic, “Power – Performance Optimization for Custom Digital Circuits,” Proc. PATMOS, pp. 404-414, Sept., 2005. • [4] V. Abramzon, E. Alon, M. Horowitz Stanford University

Thank you fmontazeri@ece.ut.ac.ir

A 240ps 64b Carry-Lookahead Adder in 90nm CMOS