280 likes | 448 Views
Interconnect Modeling for Improved System-Level Design Optimization. Luca Carloni § Andrew B. Kahng ¶ Swamy Muddu ¶ Alessandro Pinto ‡ Kambiz Samadi ¶ Puneet Sharma ¶ § Columbia University ¶ University of California, San Diego ‡ University of California, Berkeley January 22, 2008.
E N D
Interconnect Modeling for Improved System-Level Design Optimization Luca Carloni§ Andrew B. Kahng¶ Swamy Muddu ¶ Alessandro Pinto‡ Kambiz Samadi ¶ Puneet Sharma ¶ § Columbia University ¶ University of California, San Diego ‡ University of California, Berkeley January 22, 2008
Outline • Motivation • System-Level Communication Synthesis • Buffered Interconnect Model • Interconnect Optimization • Validation and Significance Assessment • Conclusions
Motivation • Focus of design process is shifting from “computation” to “communication” • Device and interconnect performance scaling mismatches cause breakdown of traditional across-chip communication • System-level designers require accurate, yet simple models to bridge planning and implementation stages • Today’s system-level performance, power modeling suffers: • Ad hoc selection of models • Poor balance between accuracy and simplicity • Poor definition of inputs • Lack of model extensibility across future technology nodes • Inability to explore different implementation styles Our Goal: Develop accurate models that are easily usable by system-level design early in the design cycle
Previous Interconnect Delay Models • Missing required aspects of accurate delay estimation 90nm • Do not consider input slew change, which impacts effective drive resistance and consequently cell delay • Do not consider scattering, which impacts metal resistivity and consequently metal resistance • Bakoglu90 • No crosstalk impact, assumes driver on-resistance Rd, gate input capacitance Cgvary linearly with device size, uses Elmore delay model • Pamunuwa03 • Similar to Bakoglu90 but adds crosstalk impact • CongPan99 (IPEM) • Multiple delay models under certain optimization schemes • Use of second-order RC model for gate delay (e.g., Shao03) • Does not address gate loading during model construction
Other Limitations of Previous Work • Design style and buffering schemes • Design-level degrees of freedom: wire width, spacing, shielding • Practical buffer sizing • Only consider the delay as optimization objective = wrong • Analytic solutions have large buffer sizes (100X-400X) which are not in any realistic cell library • Model inputs and technology capture • Do not have well-defined pathways to capture necessary technology and device parameters • Collect inputs from ad hoc sources, which often leads to misleading conclusions
Outline • Motivation • System-Level Communication Synthesis • Buffered Interconnect Model • Interconnect Optimization • Validation and Significance Assessment • Conclusions
Communication Synthesis for Network-on-Chip • Given • An input specification as a set of communication constraints • A library of communication components • An objective function (e.g., power, area, delay) • Find • A network-on-chip implementation as a composition of library components that • Satisfies the specification • Minimizes the cost function • Communication Synthesis Infrastructure (COSI) • Based on the Platform-Based Design methodology • Takes specification and library descriptions in XML format • Produces a variety of outputs , including a cycle accurate SystemC implementation of the optimal network-on-chip
Point-to-Point Specification On-Chip Communication Library Synthesis Result Constraint-Driven Communication Synthesis Perf. / Cost Abstractions Constraints Propagation Application Implementation Synthesis
Communication Synthesis Key Elements • Specification of input constraints • Set of IP cores: area and interface • End-to-end communication requirements between pairs of IP cores: latency and throughput • Characterization of library of components • Interface types, max number of ports • Max capacities: bandwidth, latency, max distance • Performance and cost model • Component instantiation and parallel composition • Rename, set parameters of library components • Composition based on algebra on quantities (including type compatibility)
Platform Instance 2 Platform Instance 1 Communication Synthesis Example • Synthesis of optimal network-on-chip • Return valid composition that meets input constraints and • Minimizes the objective function (e.g., power dissipation) (Original Specification)
COSI: Communication Synthesis Infrastructure • COSI is a public-domain software package for NoC synthesis http://embedded.eecs.berkeley.edu/cosi/
Outline • Motivation • System-Level Communication Synthesis • Buffered Interconnect Model • Interconnect Optimization • Validation and Significance Assessment • Conclusions
Tech. Characteristics • # metal layers • min. width, spacing, thickness • dielectric thickness, constant • device drive res, cap, leakage Delay Design Style Area • width/spacing configs • buffering scheme • shielding • signaling scheme Proposed Model Leakage Bus Attributes • length, # bits, layer, switching Dynamic Max. unclocked length, # pipelines, latency, throughput Proposed Model Features • Improved accuracy with respect to well-known models • Modeling of nanoscale-era effects: crosstalk, scattering, barrier thickness, dependence of delay on slews, etc. • Single-digit percentage accuracy relative to gate-level analyses
Model Technology Inputs • Inputs for repeater delay calculation • Delay and slew values for a set of input slew and load capacitance values (obtained from Liberty / Timing Library Formats (TLF) / SPICE) • Input capacitance for different repeater size (Liberty, Predictive Technology Models (PTM)) • Inputs for wire delay calculation • Wire dimensions (ITRS/PTM, LEF, ITF) • Inter-wire spacings for global and intermediate layers (ITRS/PTM, LEF, ITF) • Inputs for power calculation • Input capacitance (Liberty, PTM) • Wire parasitics (computed in wire delay calculation) • Inputs for area calculation • Wire dimensions used above • Repeater area is available from Liberty and for future technologies, ITRS A-factors or proposed area models can be used
Buffered Interconnect Model • Buffered interconnect model for delay, power, and area • Constructed from: buffer (repeater) and wire delay models • Accounts for coupling capacitances, slew dependence and UDSM effects (e.g., scattering-dependent wire resistance changes) • Calibrated against SPICE • Components: • Repeater delay model • Separate models for intrinsic delay, output slew, input capacitance • Wire delay model • Accounts for coupling capacitance impact on wire delay • Repeater power model • Accounts for sub-threshold and gate leakages • Repeater area model • Derived from existing cell layouts (can be extrapolated) • Wire area model • Derived from wire width and spacing (can be extrapolated)
Repeater Delay Model • Repeater delay can be decomposed into load independent (i) and load dependent (rd.cl) components: d = i + rd.cl i(si) = α0 + α1.s1 + α2.si2 • si denotes input slew; α0, α1 and α2 are the coefficient by quadratic regression • Drive resistance is nearly linear with input slew; also both the intercept and slope vary with repeater size rd = rd0 + rd1.si • Output slew depends on load capacitance; slope is independent of input slew, while intercept depends linearly on it so(cl , si) = so0 + s01.si + so2.cl • so is the output slew, and so0, so1 and so2 are the fitting coefficients from linear regression • ci is the input capacitance, wp, wn are PMOS and NMOS widths respectively, and η is a coefficient derived using linear regression with zero intercept ci = η× (wp + wn)
Wire Delay Model • For wire delay we use the model proposed by Pamunuwa et al. (cf. TVLSI03) which accounts for cross-talk • dw, rw, cg, cc, and ci respectively denote wire delay, wire resistance, ground capacitance, coupling capacitance and input capacitance of the next-stage repeater • λi is a coefficient (i.e., based on SPICE simulation) due to switching patterns of the neighboring wires dw= rw.(0.4cg + (λi.cc)/2 + 0.7ci) • We enhance the quality of the wire delay model by considering two other important factors that change wire resistance: • Scattering-aware resistivity (cf. Shi et al.ASPDAC06): ρ(w) = ρB + Kρ/ww • ww is the wire width, ρB=2.202 µΩ.cm, and Kρ=1.030×10-15Ω.m2 • Interconnect barrier (cf. Mai et al. IEEE01) • tm, tb respectively are the metal and barrier thicknesses, lw is the length of the wire, and ρ is computed using the above equation rw = (ρ.lw) / (tm - tb).(ww - 2tb)
Repeater and Wire Delay Models • Model coefficient fit from data extracted from Liberty/LEF/Tech. files and other extrapolatable sources (i.e., PTM and ITRS) Intrinsic Delay Model – i(slewin) Drive Resistance Model – r(slewin) • delay = i(slewin) + r(slewin) * CL • r(s) = f(size, slewin) • slewout = f(slewin,CL) • wire delay = Elmore Output Slew Model – o(slewin, CL)
Repeater and Wire Power Models • Power is an important design objective and must be accounted for early in the design flow • Today, leakage and dynamic power are primary forms of power dissipation • Leakage has two main components: (1) sub-threshold leakage, and (2) gate-tunneling current • Both components depend linearly on device size ps= (psn + psp) / 2 psn = k0n + k1n.wn psp = k0p+ k1p.wp • Dynamic power can be calculated as: pd = a.cl.vdd2.f cl = ci + cg + cc • pd, a, cl, vdd and f are dynamic power, activity factor, load capacitance, supply voltage and frequency, respectively • Load capacitance is composed of the input capacitance of the next repeater (ci), ground (cg) and coupling (cc) capacitances of the wire driven
Repeater and Wire Area Models • For existing technologies, the area of a repeater can be calculated as: ar = τ0 + τ1.wn • ardenotes repeater area, τ0and τ1 are coefficients using linear regression; wnand wp are widths of NMOS and PMOS, respectively • For future technologies, feature size (F), contacted pitch (CP), row height (RH), and row width (RW) can be used to estimate the area: NF = (wp + wn + 2.F) / RH RW = NF × (F + CP) + CP ar = RH × RW • Wiring area can be calculated as: aw = n × (ww + sw) + sw • aw denotes wire area, n is the bit width of the bus, and ww and sw are wire width and spacing
Repeater Power and Area Models • Repeater area and power models fit from simulation data points • Area and leakage power are linear over the range of implementable repeater sizes (larger repeater sizes higher leakage power)
Outline • Motivation • System-Level Communication Synthesis • Buffered Interconnect Model • Interconnect Optimization • Validation and Significance Assessment • Conclusions
Interconnect Optimization: Buffering • Conventional delay-optimal buffering unrealistic buffer sizes high dynamic / leakage power suboptimal • Our approach: iterative optimization of hybrid objective (power + delay) • Search for optimal number and size of repeaters • Can be extended for other interconnect optimizations (e.g., wire sizing and driver sizing) Pareto-optimal frontier of the power-delay tradeoff of a 5mm interconnect in 90nm / 65nm
Outline • Motivation • Communication Synthesis • Buffered Interconnect Model • Interconnect Optimization • Validation and Significance Assessment • Conclusions
Model Validation • Model comparison with results from physical implementation • {5mm wire} X {90nm, 65nm} X {wiring layers} X {design styles} • Model-predicted delays compared with delays from PrimeTime Deviation of proposed model from PrimeTime delays < 15%
Impact on System-Level Design • Testcases • VPROC: video processor with 42 cores and 128-bit datawidth • dVOPD: dual video object plane decoder with 26 cores and 128-bit datawidth • Original model (Orig.)underestimates power compared to the Proposed Model (Prop.) • Original Model is very optimistic in delay (i.e. the synthesis result may be actually infeasible). • This could become more critical as technology scales and the chip size becomes larger than the critical sequential length.
Outline • Motivation • System-Level Communication Synthesis • Buffered Interconnect Model • Interconnect Optimization • Validation and Significance Assessment • Conclusions
Conclusions and Future Directions • Accurate models can drive effective system-level exploration • Inaccurate models can lead to misleading design targets • Reproducible methodology for extracting inputs to models from reliable sources • More realistic buffering scheme, where power and area are considered in addition to delay • Modeling of NoC components besides wires • Across future nanometer technologies (45nm and beyond) • At different levels of abstractions • protocol encapsulation (e.g., hand-shaking for AMBA bus allocation) • buses, pipelined rings (e.g. EIB in IBM Cell) • routers, network interfaces • FIFOs, queues, crossbar switches (where ORION left off) • from high-level analytical models to low-level executable models • Extending to other metrics • Reliability estimation (i.e., error probability of transmission over wires)