390 likes | 529 Views
A Comprehensive Look at System Level Modeling. Ken Rose, Bibiche Geuskens, Ramon Mangaser, Christopher Mark. Center for Integrated Electronics and Electronics Manufacturing Department of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute Troy, NY 12180-3590
E N D
A Comprehensive Look at System Level Modeling Ken Rose, Bibiche Geuskens, Ramon Mangaser, Christopher Mark Center for Integrated Electronics and Electronics Manufacturing Department of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute Troy, NY 12180-3590 rosek@rpi.edu 518.276.2981
RIPE Rensselaer InterconnectPerformance Estimator RIPE 3.0 models are described in ‘Modeling Microprocessor Performance’ by B. Geuskens and K. Rose, Kluwer, 1998. It is available for use on line at http://latte.cie.rpi.edu/ripe.html RIPE was developed with partial support from IBM and SRC.
Co-Authors: • Bibiche Geuskens (RIPE 1.0, 2.0, 3.0) PhD. June 1997 Intel Corporation, Hillsboro, Oregon • Ramon Mangaser(RIPE 3.1, 4.0, 4.1) PhD. Nov. 1999 Sun Microsystems, Chelmsford, Massachusetts • Christopher Mark (RIPE 4.2) PhD. Sep. 2000 Intel Corporation, Hillsboro, Oregon
RIPE Genesis: • H.B. Bakoglu • ‘Circuits, Interconnections, and Packaging for VLSI’ • Addison-Wesley, 1990. • SUSPENS model coded in RIPE 1.0 • G. A. Sai-Halasz • Proc. IEEE, 83/1, p. 20, 1995. • Basis for RIPE 2.0
RIPE Wiring density System/Area Interconnect RC delay System Description Interconnect Cycle time Device Device/Technology Description Power dissipation Capacitance Wireability Resistance Performance Crosstalk Interconnect Description Power Dissipation Electromigration Yield Reliability RIPE 3.0 Inputs and Outputs
Interconnect Parameters: Pitch [mm]: 1.125, 1.125, 3.0, 3.0 Rint [W/cm]: 1440, 1440, 178, 178 Cint [pF/cm]: 2.0, 2.0, 2.0, 2.0 RIPE 3.0 Sample Benchmark (DEC Alpha 21164) RIPEINPUTS System Parameters: Technology Parameters: Chip Area [cm2]: 2.99 Number of Transistors [M]: 9.3 SRAM [KBytes]: 112 Signal I/O: 294 (Logic Depth: 14, 15) Feature Size [mm]: 0.5 Number of Wire Levels: 4 Power Supply [V]: 3.3 Data: W.J. Bowhill et al., Dig. Tech. Journal; ISSCC 1996
Cycle Time Estimation Model (Ch. 7) Sai-Halasz (1995) Sakurai (1993)
CL CV RC Interconnect Parameters (Ch. 3) Interconnect Resistance (3.1) R = reff lint /A wint2 A = Aspect Ratio Interconnect Capacitance (3.2) C = 2(CV + CL) l 2eeff e0 lint wint (1/TILD + A/Swire) TILD = Thickness of Interlevel Dielectric Swire = Spacing between wires Yang (1998)
Transistor Count and Area Models (Ch. 4) Processor Logic, Memory, and I/O Buffers are treated separately Transistors Area Alpha 21164 9.3 M 299 mm2 Memory 6.7 101 I/O-------- 17 Random Logic 2.6 M 181 mm2 # Gates # Transistors Average Logic Gate Size Logic Area
Logic Wireability (Ch. 5) R(Ng ,p) = average interconnect length in gate pitches Based on Rent’s rule for the number of pins, Np = Kp (Ng)p lw = long wire length = 2 (Alogic)1/2 Nw = number of long wires = [fg/(fg+1)] Nptotal where Nptotal is the total number of pins for functional blocks and fg is the average logic gate fanout.
Device Parameters (Ch. 6) We need to have values for transistor resistors and capacitors, Rdr and Cdr . These have been superseded in RIPE 4.0. Cycle Time Estimation Model (Ch. 7) Tcycle = (fld – 1)Tgavg + 2Tginv + time_of_flight where fld is the logic depth
Power Dissipation (Ch. 8) • Ptot = fd Ctot Vdd Vswing fc + Isc Vdd + Ileak Vdd • lSi (fdi Csw,i) Vdd2 fc • where fd is the activity factor. • 1. random logic fd Csw,rl • 2. clock distribution fd,clk Csw,clk • 3. memory fd Csw,mem • 4. interconnections fd Csw,int • 5. off-chip drivers fd Csw,dr • For the Alpha 21164 fd,clk = 0.75, fd = 0.15 based on published details.
RIPE 3.0 Sample Benchmark (DEC Alpha 21164) RIPE Results Al/SiO2 RIPE Results Cu/SiO2 Actual Memory Transistors: 6.73 M 7.2M 6.73 M Area memory: 1.01 cm2 1.02 cm2 1.01 cm2 Pad ring area: 0.16 cm2 0.17 cm2 0.16 cm2 Clock frequency: 291 MHz 300 MHz 373 MHz Power Dissipation: 52 W 50 W 66 W Power clock distribution: 21 W 20 W 27 W
RIPE Simulation Modes: RIPE 3.0 to RIPE 4.0 Performance Estimator Clock Frequency, Wiring -n and -d modes RIPE 3.0 Power, Strategy Wireability Wiring Allocator Wiring Clock -aw mode RIPE 4.0 Strategy Frequency
Intel Wiring Distribution Model • #Nets / D Nets l B Lnetsb , b = -1.65 #Nets l A (#Transistors), A l 0.25 S. Yang, MRS Symposium on Advanced Interconnects, April 1998. #Nets = [B/(b + 1)] [Lmaxb +1 - Lminb+1] Demand = [B/(b + 2)] [Lmaxb +2 - Lminb+2] We have taken Lmax = 2 (Logic_Area)1/2 and solve the above equations for B and Lmin .
Algorithm for RIPE 4.0 Cycle-Time Based Wiring Allocation • Set the input clock frequency and logic depth. • Use RIPE’s critical path model to estimate total average delay, including gate and wire delay. • Determine the maximum allowable long wire delay by subtracting the total average delay from the target cycle time. • Allocate wires using this maximum total long wire delay as a constraint, but allowing a maximum number of repeaters.
Modifying the Cycle-Time Model for RIPE 4.0 Tcycle = fld Tavg + Tlong + time_of_flight Tavg = 0.377(rint cint lint2) + 0.693{Rgout (Cgout + fg Cgin) + Rgout [(fg + 1)/2] cint lint + rint [(fg + 1)/2] lint Cgin} Tlong = 0.377(rint cint llong2) + 0.693[R’gout (C’gout + C’gin) + R’gout cint llong + rintllong C’gin]
Katmai Wiring Strategy Calculated by RIPE 4.0 Level Pitch rint cint Lmax [x0.64m] [/cm] [pF/cm] [mm] 1 1.0 3451 2.37 0.006 2-3 1.45 891 2.61 4.4 4 2.5 365 2.40 12.3 5 4.0 158 2.34 20.5 Level Repeaters Level Wiring Total Wiring for Lmax Efficiency Efficiency 1 0 0.02 0.02 2-3 0 0.30 0.18 4 2 0.50 0.23 5 3 0.52 0.25
RIPE Inclusions • BEOL Yield • Signal Integrity • Electromigration • Cache Memory Performance • Repeater Insertion • Interconnect Inductance • Accurate MOSFET Models
BEOL Yield in RIPE • Critical Area • Cube law distribution of defect sizes • Poisson distribution of faults • Ytotal = e-lopen e-lshort
Katmai (250 nm Pentium III) Transition to 180nm Technology • Katmai Shrink (Katmai-180) • number of transistors 9.5M • chip size 1.23 0.62 cm2 • clock frequency 600 850 MHz • metal layers 5 6 • 4 wiring domains • Katmai Shrink and Doubling (Katmai2) • number of transistors 19M • chip size 1.24 cm2 • clock frequency 850 MHz • metal layers 10 • 9 wiring domains
Contributions of Different Metal Levels to Random Defect Yields for Katmai and Katmai2
Ccint Cpint fraction of victim wire parallel to attacker Signal Integrity Limits Sakurai (1993)
Vp Comparison between SPICE, Sakurai Model, and the Modified HP Model for Deschutes (250 nm Pentium II)
Cache Memory Performance We assume that the cycle time is defined by the logic subsystem. Calculated cache access times greater than this cycle time will be flagged and reported by RIPE. RIPE will then assume that the cache requires multiple clock cycles for proper operation. RIPE 4.1 implements the model of Wada et al. (1992) IEEE JSSC,27, p. 1147. It can be linked to the more accurate CACTI model of Wilton and Jouppi (1996) IEEE JSSC, 31, p. 677.
Inductance in RIPE 4.2 • RIPE has good estimates of wire capacitance (per unit length) [Geuskens and Rose, 98, Mangaser (Ph.D. Thesis), 99] • Estimate wire inductance from wire capacitance • Assume homogeneous medium and TEM mode propagation • Inductance analysis performed in two steps • Identification of wiring levels with significant inductance effects • Incorporate Ismail’s formulas for an inductance figure of merit (FOM) to define upper and lower bounds for wire lengths that are susceptible to inductance effects on each wiring level • Use constant RC values to estimate rise times needed in FOM • Optimization of inductance-susceptible levels • Revert to wire pitch from the last, previous wiring level without inductance effects • Given long-wire delay constraint, use Ismail’s RLC-based formulas to determine maximum wire length (per level)
RIPE 4.2 wire level projections using Cu/low-K(=2) • Using ITRS’99 scaling trends • Using RPI and Bohr scaling trends with ITRS’99 clock frequencies • ITRS’99 scaling trends for MOSFETs, chip size and transistor counts are overly aggressive !!
A Constant RC Input-Signal-Transition-Inherent (CRISTI) gate delay model: Constant RC model of an inverter chain Vdd Rpu1 Rpu2 Rpu3 Rpd1 Cnode1 Rpd2 Cnode2 Rpd3 Cnode3 For Inverter 2 (assuming rf)
Previous approaches to estimating constant RC values • Resistance (1) , (2)
Two general methods of determining constant RC values • Method 1 - Given a full set of SPICE parameters, determine R and C from SPICE simulations of inverter chains - Use actual gates, not step or ramp inputs, to drive inverters under investigation better characterization of RC values - Use a constant RC input-signal-transition-inherent gate delay model for inverters • Method 2 - Given limited MOSFET information, determine R and C from the “CV/I” metric - Use this method to project RC values for deep sub-micron CMOS technologies
C-IRSIM • CRISTI model for inverters was extended to multi-transistor (>2) logic gates • 3-input NAND gates used initially • Focus placed on transistors in series stacks • Relative topological position and relative turn-on order • These combined features determine the appropriate R and C value for each transistor in a series stack • Ignoring these features leads to significant errors in delay estimation relative to SPICE • Elmore delay terms included withRC term to account for distributed RC effects in complex gates • CRISTI incorporated into IRSIM C-IRSIM
C-IRSIM simulation examples • 1056-transistor, 6-bit DADDA multiplier circuit in 0.18m technology
Significance of good device models • Selected cycle-time components from RIPE 4.2 • Fraction of cycle time consumed by total logic delay can be relatively large (0.5-0.66) !! Devices cannot be neglected altogether • Small change in device delay potentially big change in total wiring levels
Conclusions • Reasonable estimates can be made of microprocessor performance on the basis of limited information. • Models should be robust with a limited number of arbitrary fitting parameters. • Interconnect limitations constrain design and manufacture.
RIPE 4.0 Sample Benchmark Intel’s Deschutes (Pentium II) processor RIPE INPUTS System Parameters Technology Parameters Wire Parameters Circuit Area (mm2): 1.31 Technology Generation Pitch (mm): 0.64, 0.93 Number of Transistors (mm): 0.25 0.93, 1.60, 2.56 (M): 7.5 LGATE(mm): 0.18 rint (/cm): 3451, 891, SRAM cells (mm2): 10.26 Num. of wire levels: 5 891, 365, 158 SRAM (Kbytes): 32 (Aluminum) cint (pF/cm): 2.4, 2.6, Signal I/O: 242 Core Supply (V): 1.8 2.6, 2.4, 2.3 RIPE RESULTSACTUAL Clock Frequency (MHz) 459 450 Power Dissipation (W) 18.7 18.9
Wiring strategy results from RIPE 4.1 for a 100nm, Cu/low-K(=2) technology using RPI/Bohr/ITRS’99 scaling • No inductance analysis • Repeaters chosen to maximize chip wireability
Wiring strategy results from RIPE 4.2 for a 100nm, Cu/low-K(=2) technology using RPI/Bohr/ITRS’99 scaling • Inductance analysis performed • Repeaters again chosen to maximize chip wireability • Compromise between maximizing chip wireability and minimizing RLC delay • Wire inductance reduces the effect of wire resistance • Smaller wire pitches but longer wire lengths • Reduction in total number of wire levels