1 / 39

A Comprehensive Look at System Level Modeling

This research paper discusses the RIPE 3.0 models for interconnect performance estimation and their applications in microprocessor performance modeling. It also provides information on the genesis of RIPE and its co-authors.

larent
Download Presentation

A Comprehensive Look at System Level Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Comprehensive Look at System Level Modeling Ken Rose, Bibiche Geuskens, Ramon Mangaser, Christopher Mark Center for Integrated Electronics and Electronics Manufacturing Department of Electrical, Computer and Systems Engineering Rensselaer Polytechnic Institute Troy, NY 12180-3590 rosek@rpi.edu 518.276.2981

  2. RIPE Rensselaer InterconnectPerformance Estimator RIPE 3.0 models are described in ‘Modeling Microprocessor Performance’ by B. Geuskens and K. Rose, Kluwer, 1998. It is available for use on line at http://latte.cie.rpi.edu/ripe.html RIPE was developed with partial support from IBM and SRC.

  3. Co-Authors: • Bibiche Geuskens (RIPE 1.0, 2.0, 3.0) PhD. June 1997 Intel Corporation, Hillsboro, Oregon • Ramon Mangaser(RIPE 3.1, 4.0, 4.1) PhD. Nov. 1999 Sun Microsystems, Chelmsford, Massachusetts • Christopher Mark (RIPE 4.2) PhD. Sep. 2000 Intel Corporation, Hillsboro, Oregon

  4. RIPE Genesis: • H.B. Bakoglu • ‘Circuits, Interconnections, and Packaging for VLSI’ • Addison-Wesley, 1990. • SUSPENS model coded in RIPE 1.0 • G. A. Sai-Halasz • Proc. IEEE, 83/1, p. 20, 1995. • Basis for RIPE 2.0

  5. RIPE Wiring density System/Area Interconnect RC delay System Description Interconnect Cycle time Device Device/Technology Description Power dissipation Capacitance Wireability Resistance Performance Crosstalk Interconnect Description Power Dissipation Electromigration Yield Reliability RIPE 3.0 Inputs and Outputs

  6. Interconnect Parameters: Pitch [mm]: 1.125, 1.125, 3.0, 3.0 Rint [W/cm]: 1440, 1440, 178, 178 Cint [pF/cm]: 2.0, 2.0, 2.0, 2.0 RIPE 3.0 Sample Benchmark (DEC Alpha 21164) RIPEINPUTS System Parameters: Technology Parameters: Chip Area [cm2]: 2.99 Number of Transistors [M]: 9.3 SRAM [KBytes]: 112 Signal I/O: 294 (Logic Depth: 14, 15) Feature Size [mm]: 0.5 Number of Wire Levels: 4 Power Supply [V]: 3.3 Data: W.J. Bowhill et al., Dig. Tech. Journal; ISSCC 1996

  7. Cycle Time Estimation Model (Ch. 7) Sai-Halasz (1995) Sakurai (1993)

  8. CL CV RC Interconnect Parameters (Ch. 3) Interconnect Resistance (3.1) R = reff lint /A wint2 A = Aspect Ratio Interconnect Capacitance (3.2) C = 2(CV + CL) l 2eeff e0 lint wint (1/TILD + A/Swire) TILD = Thickness of Interlevel Dielectric Swire = Spacing between wires Yang (1998)

  9. Transistor Count and Area Models (Ch. 4) Processor Logic, Memory, and I/O Buffers are treated separately Transistors Area Alpha 21164 9.3 M 299 mm2 Memory 6.7 101 I/O-------- 17 Random Logic 2.6 M 181 mm2 # Gates # Transistors Average Logic Gate Size Logic Area

  10. Logic Wireability (Ch. 5) R(Ng ,p) = average interconnect length in gate pitches Based on Rent’s rule for the number of pins, Np = Kp (Ng)p lw = long wire length = 2 (Alogic)1/2 Nw = number of long wires = [fg/(fg+1)] Nptotal where Nptotal is the total number of pins for functional blocks and fg is the average logic gate fanout.

  11. Device Parameters (Ch. 6) We need to have values for transistor resistors and capacitors, Rdr and Cdr . These have been superseded in RIPE 4.0. Cycle Time Estimation Model (Ch. 7) Tcycle = (fld – 1)Tgavg + 2Tginv + time_of_flight where fld is the logic depth

  12. Power Dissipation (Ch. 8) • Ptot = fd Ctot Vdd Vswing fc + Isc Vdd + Ileak Vdd • lSi (fdi Csw,i) Vdd2 fc • where fd is the activity factor. • 1. random logic fd Csw,rl • 2. clock distribution fd,clk Csw,clk • 3. memory fd Csw,mem • 4. interconnections fd Csw,int • 5. off-chip drivers fd Csw,dr • For the Alpha 21164 fd,clk = 0.75, fd = 0.15 based on published details.

  13. RIPE 3.0 Sample Benchmark (DEC Alpha 21164) RIPE Results Al/SiO2 RIPE Results Cu/SiO2 Actual Memory Transistors: 6.73 M 7.2M 6.73 M Area memory: 1.01 cm2 1.02 cm2 1.01 cm2 Pad ring area: 0.16 cm2 0.17 cm2 0.16 cm2 Clock frequency: 291 MHz 300 MHz 373 MHz Power Dissipation: 52 W 50 W 66 W Power clock distribution: 21 W 20 W 27 W

  14. RIPE 3.0 Benchmark Results

  15. RIPE Simulation Modes: RIPE 3.0 to RIPE 4.0 Performance Estimator Clock Frequency, Wiring -n and -d modes RIPE 3.0 Power, Strategy Wireability Wiring Allocator Wiring Clock -aw mode RIPE 4.0 Strategy Frequency

  16. Intel Wiring Distribution Model • #Nets / D Nets l B Lnetsb , b = -1.65 #Nets l A (#Transistors), A l 0.25 S. Yang, MRS Symposium on Advanced Interconnects, April 1998. #Nets = [B/(b + 1)] [Lmaxb +1 - Lminb+1] Demand = [B/(b + 2)] [Lmaxb +2 - Lminb+2] We have taken Lmax = 2 (Logic_Area)1/2 and solve the above equations for B and Lmin .

  17. Algorithm for RIPE 4.0 Cycle-Time Based Wiring Allocation • Set the input clock frequency and logic depth. • Use RIPE’s critical path model to estimate total average delay, including gate and wire delay. • Determine the maximum allowable long wire delay by subtracting the total average delay from the target cycle time. • Allocate wires using this maximum total long wire delay as a constraint, but allowing a maximum number of repeaters.

  18. Modifying the Cycle-Time Model for RIPE 4.0 Tcycle = fld Tavg + Tlong + time_of_flight Tavg = 0.377(rint cint lint2) + 0.693{Rgout (Cgout + fg Cgin) + Rgout [(fg + 1)/2] cint lint + rint [(fg + 1)/2] lint Cgin} Tlong = 0.377(rint cint llong2) + 0.693[R’gout (C’gout + C’gin) + R’gout cint llong + rintllong C’gin]

  19. RIPE 4.0 Benchmark Results

  20. Katmai Wiring Strategy Calculated by RIPE 4.0 Level Pitch rint cint Lmax [x0.64m] [/cm] [pF/cm] [mm] 1 1.0 3451 2.37 0.006 2-3 1.45 891 2.61 4.4 4 2.5 365 2.40 12.3 5 4.0 158 2.34 20.5 Level Repeaters Level Wiring Total Wiring for Lmax Efficiency Efficiency 1 0 0.02 0.02 2-3 0 0.30 0.18 4 2 0.50 0.23 5 3 0.52 0.25

  21. RIPE Inclusions • BEOL Yield • Signal Integrity • Electromigration • Cache Memory Performance • Repeater Insertion • Interconnect Inductance • Accurate MOSFET Models

  22. BEOL Yield in RIPE • Critical Area • Cube law distribution of defect sizes • Poisson distribution of faults • Ytotal = e-lopen e-lshort

  23. Katmai (250 nm Pentium III) Transition to 180nm Technology • Katmai Shrink (Katmai-180) • number of transistors 9.5M • chip size 1.23 0.62 cm2 • clock frequency 600 850 MHz • metal layers 5 6 • 4 wiring domains • Katmai Shrink and Doubling (Katmai2) • number of transistors 19M • chip size 1.24 cm2 • clock frequency 850 MHz • metal layers 10 • 9 wiring domains

  24. Contributions of Different Metal Levels to Random Defect Yields for Katmai and Katmai2

  25. Ccint Cpint fraction of victim wire parallel to attacker Signal Integrity Limits Sakurai (1993)

  26. Vp Comparison between SPICE, Sakurai Model, and the Modified HP Model for Deschutes (250 nm Pentium II)

  27. Cache Memory Performance We assume that the cycle time is defined by the logic subsystem. Calculated cache access times greater than this cycle time will be flagged and reported by RIPE. RIPE will then assume that the cache requires multiple clock cycles for proper operation. RIPE 4.1 implements the model of Wada et al. (1992) IEEE JSSC,27, p. 1147. It can be linked to the more accurate CACTI model of Wilton and Jouppi (1996) IEEE JSSC, 31, p. 677.

  28. Inductance in RIPE 4.2 • RIPE has good estimates of wire capacitance (per unit length) [Geuskens and Rose, 98, Mangaser (Ph.D. Thesis), 99] • Estimate wire inductance from wire capacitance • Assume homogeneous medium and TEM mode propagation • Inductance analysis performed in two steps • Identification of wiring levels with significant inductance effects • Incorporate Ismail’s formulas for an inductance figure of merit (FOM) to define upper and lower bounds for wire lengths that are susceptible to inductance effects on each wiring level • Use constant RC values to estimate rise times needed in FOM • Optimization of inductance-susceptible levels • Revert to wire pitch from the last, previous wiring level without inductance effects • Given long-wire delay constraint, use Ismail’s RLC-based formulas to determine maximum wire length (per level)

  29. RIPE 4.2 wire level projections using Cu/low-K(=2) • Using ITRS’99 scaling trends • Using RPI and Bohr scaling trends with ITRS’99 clock frequencies • ITRS’99 scaling trends for MOSFETs, chip size and transistor counts are overly aggressive !!

  30. A Constant RC Input-Signal-Transition-Inherent (CRISTI) gate delay model: Constant RC model of an inverter chain Vdd Rpu1 Rpu2 Rpu3 Rpd1 Cnode1 Rpd2 Cnode2 Rpd3 Cnode3 For Inverter 2 (assuming rf)

  31. Previous approaches to estimating constant RC values • Resistance (1) , (2)

  32. Two general methods of determining constant RC values • Method 1 - Given a full set of SPICE parameters, determine R and C from SPICE simulations of inverter chains - Use actual gates, not step or ramp inputs, to drive inverters under investigation  better characterization of RC values - Use a constant RC input-signal-transition-inherent gate delay model for inverters • Method 2 - Given limited MOSFET information, determine R and C from the “CV/I” metric - Use this method to project RC values for deep sub-micron CMOS technologies

  33. C-IRSIM • CRISTI model for inverters was extended to multi-transistor (>2) logic gates • 3-input NAND gates used initially • Focus placed on transistors in series stacks • Relative topological position and relative turn-on order • These combined features determine the appropriate R and C value for each transistor in a series stack • Ignoring these features leads to significant errors in delay estimation relative to SPICE • Elmore delay terms included withRC term to account for distributed RC effects in complex gates • CRISTI incorporated into IRSIM  C-IRSIM

  34. C-IRSIM simulation examples • 1056-transistor, 6-bit DADDA multiplier circuit in 0.18m technology

  35. Significance of good device models • Selected cycle-time components from RIPE 4.2 • Fraction of cycle time consumed by total logic delay can be relatively large (0.5-0.66) !!  Devices cannot be neglected altogether • Small change in device delay  potentially big change in total wiring levels

  36. Conclusions • Reasonable estimates can be made of microprocessor performance on the basis of limited information. • Models should be robust with a limited number of arbitrary fitting parameters. • Interconnect limitations constrain design and manufacture.

  37. RIPE 4.0 Sample Benchmark Intel’s Deschutes (Pentium II) processor RIPE INPUTS System Parameters Technology Parameters Wire Parameters Circuit Area (mm2): 1.31 Technology Generation Pitch (mm): 0.64, 0.93 Number of Transistors (mm): 0.25 0.93, 1.60, 2.56 (M): 7.5 LGATE(mm): 0.18 rint (/cm): 3451, 891, SRAM cells (mm2): 10.26 Num. of wire levels: 5 891, 365, 158 SRAM (Kbytes): 32 (Aluminum) cint (pF/cm): 2.4, 2.6, Signal I/O: 242 Core Supply (V): 1.8 2.6, 2.4, 2.3 RIPE RESULTSACTUAL Clock Frequency (MHz) 459 450 Power Dissipation (W) 18.7 18.9

  38. Wiring strategy results from RIPE 4.1 for a 100nm, Cu/low-K(=2) technology using RPI/Bohr/ITRS’99 scaling • No inductance analysis • Repeaters chosen to maximize chip wireability

  39. Wiring strategy results from RIPE 4.2 for a 100nm, Cu/low-K(=2) technology using RPI/Bohr/ITRS’99 scaling • Inductance analysis performed • Repeaters again chosen to maximize chip wireability • Compromise between maximizing chip wireability and minimizing RLC delay • Wire inductance reduces the effect of wire resistance • Smaller wire pitches but longer wire lengths • Reduction in total number of wire levels

More Related