1 / 27

System Roadmap

System Roadmap. Andrew B. Kahng Core Pillar September 29, 2006 swamy@vlsicad.ucsd.edu sharma@vlsicad.ucsd.edu kambiz@vlsicad.ucsd.edu abk@ucsd.edu. Modeling Requirements for System-Level Living Roadmap. Core Pillar Requirements (ASV).

madge
Download Presentation

System Roadmap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. System Roadmap Andrew B. Kahng Core Pillar September 29, 2006 swamy@vlsicad.ucsd.edu sharma@vlsicad.ucsd.edu kambiz@vlsicad.ucsd.edu abk@ucsd.edu

  2. Modeling Requirements for System-Level Living Roadmap

  3. Core Pillar Requirements (ASV) • Benefits of technology scaling can be sustained by migrating design process to system scaling paradigm  design elements are IP blocks/processor cores as opposed to devices and standard cells • New system synthesis paradigms rely on accurate yet simple models of delay/area/power/cost trade-offs for parameterized design elements • Models of block-level design metrics should account for • Cost and impact of design techniques to cope with variability • Cost of hardware (in terms of design metrics) required for adaptivity/ resiliency • Goal: Synthesize and abstract the impact of low-level technology parameters (and their variabilities) on design metrics of system-level blocks

  4. BEOL Stack Optimization (Nagaraj, TI) • Quality assessment of BEOL interconnect stacks • Inputs: Technology parameters (resistivity, ILD thickness etc.), geometry parameters (wire widths, pitches), Rent parameters • Stack quality assessment is required for blocks (instead of individual wires) such as data path elements, SoC components and processor cores • Outputs: Reports of trade-offs and models of stack performance metrics for system-level exploration • Existing wire length distribution models and interconnect performance metrics optimize stack metrics do not have a system-level view • Stack parameter exploration and optimization should be driven by “design-level” throughput and power considerations • E.g., area-normalized throughput and power density

  5. Interconnect Library Modeling (Carloni) • Focus of design process is shifting from “computation” to “communication” • Device scaling and interconnect performance scaling mismatches are causing breakdown of traditional across-chip communication mechanisms • New techniques: wave pipelining, stateful repeaters  communication and network centric approach for designs in future • Communication-driven design synthesis • System-level design requirements translated to communication mechanism between computational blocks  analogous to classic synthesis process (design requirements translated to computational blocks) • Mapping stage involves association of communication apparatus (links, repeaters, buses, routers etc.) to high-level synthesis solution similar to technology mapping of standard cells to generic netlist • New synthesis methodology requires “characterized” interconnect library composed of links, repeaters routers etc. • Modeling/metrics SIG can provide models of latency, bandwidth, throughput, power (high-level metrics) based on thorough characterization of library elements based on device and process technology roadmaps.

  6. Concurrent Theme Requirements (Keutzer) • Current technology extrapolation framework doesn’t allow study of impact of design choices on high-level parameters • E.g., what if a vector unit is added? What if local memory size is increased?  what is the impact of architectural design choices on chip-level attributes? • Architectural exploration work requires models of design metrics that are within ~20% • There is a significant gap between numbers in ITRS and technology extrapolation frameworks (BACPAC/GTX) • Design space/architectural exploration based on rule/inference chains will run out of steam  require models for higher-level design blocks • Specific questions: • What will be the size of an economical die in future nodes? • # RISC processors that can be implemented • # clock / power regimes (i.e., voltage islands) • Clock frequencies in future nodes • Power implications / trade-offs

  7. Other Guidance • Questions from Intel mentors • How to model the reliability and the error rate of SRAM • How to embed technological variability and reliability issues into the system diagnosis • How to identify the ‘hot spots’ of a design • How to efficiently validate the design under variations • Other • What are impacts of variability on NOC? • NBTI power-law modeling (Purdue-TI)

  8. Macromodeling

  9. The Challenge of System Projection and Design • What is impact of new technology on system macro parameters? • Execution speed, power consumed, latency, reliability, cost, … • What macromodeling will enable system-level optimization ? • System optimization : large block :: logic optimization : standard cell • “Large block” = microprocessor, memory, network, bus, … • Logic cell abstraction through 65nm WAS: size, power, delay • Block abstraction beyond 65nm MUST BE: much more • Cost and resource tradeoffs especially in the face of variability and reliability • From latency and bandwidth to flexibility and resilience • Scaling of future systems will be dominated by non-determinism •  GSRC Modeling SIG: Toward System Scaling Theory

  10. Towards Parameterized Scalable Macromodels • Low-level (device- or gate-level) models accurate but unusable for system-level exploration • Macromodels: • Estimate metrics such as delay, power, area, power/performance variability, reliability for higher-level blocks • Are scaleable to novel technologies • Are scaleable to different design styles, Vdd, Vth, etc. • Are parameterized by architectural parameters of higher-level blocks • Allow designers to: • Speculatively achieve highest performance given area, power budget • Explore reliability tradeoffs with area and power • Access system-level resiliency requirements • Develop robust designs

  11. Use Model: Facilitate System-Level Exploration Instruction-Set or Cycle-Accurate Simulator Delay Macromodels Cycle Time Performance Power Macromodels Power • Optimizations enabled: • Evaluation for future technologies • Area-performance tradeoff • Power-performance tradeoff • Resilience requirements due to reliability and/or variability System-Level Design Area Macromodels Area Reliability Macromodels Vulnerable System Components Variability Macromodels Yield Determining components

  12. Challenges in Macromodeling • Lots of high-level blocks, algorithms and design styles • Some identified blocks (cf. Gajski “Architecture SC” request): • Array structures: single- and multiple-port SRAMs, content-addressable memories, register files, reservation stations, renaming units, issue queues, branch target buffers, etc. • Complex logic blocks: adders, multipliers, dividers, vector blocks, normalization, rounding, etc. • IP blocks: encryption/decryption, JPEG/MPEG compression/decompression, CRC, etc. • On-chip communication: buses, NoCs (Polaris) • Clocking network • Lack of robust reliability and variability prediction

  13. Parametric Yield Estimation and Optimization Variability Data Technology / Circuit Data Fmax Variability SER Macromodeling Statistical Clock Skew

  14. Example: Carry-Lookahead Adder • Parameters: bit width, lookahead stages • Design styles: dynamic, static, pass-gate • Delay: carry generation for MSB slowest  based on bit width and lookahead calculate hierarchy levels  identify critical path  project delay from gate-level delay projections (ITRS + BPTM) • Power: calculated using bit width and lookahead stages in terms of gates, projected using gate-level power • Area: similar to power • Reliability and variability projections from iTunes • All metrics calibrated with implementations for few parameters and technologies

  15. Write Column logic Addr. Decoder MemoryCore Precharge, Read Column logic Example: Memory Array 6T Memory Cell Memory Array • Parameters: #bitlines, #wordlines, #ECC bits, etc. • Design styles: memory cell design, layouts, drive strength ratios, etc. • Delay: addr decoder delay + memory cell read/write delay + bitline mux delay  project delay from gate-level delay projections • Power: CACTI, IDAP (uses wordline cap, bitline cap, precharge device cap/memory cell cap, #bit flips, etc.) • Area: memory cell area dominated, easy to predict & project • Reliability and variability projections from iTunes along with #ECC bits

  16. Interconnect Stack Optimization

  17. Why New Models ? • Classic scaling laws are not aware of the implications of scaling • Models of scaling do not represent system constraint-driven design of future • Hardware overheads for resiliency, power, adaptability and tuning go against scaling  performance implications • Models of design infrastructure in future nodes should understand implications of circuit and interconnect unreliability • Static variations – process variations, NBTI • Dynamic variations – temperature, SEU, EM • Existing models are too low-level to be usable in system design scenario even with inference chain analysis (e.g., GTX)

  18. Technology Scaling : Interconnect Implications • Vdd scaling slowing  Delay scaling slowing down • Subthreshold slope limit • Vt scaling has Ioff consequences • Power concerns push Vdd down • Scaling interconnect dimensions • Wire delays become worse • Huge performance penalties (because devices also are not as fast) • Global wires are the worst victims • Repeaters are of limited help  Significant area and power penalty Global communication  a costly overhead Image source: Prof. Saraswat, Stanford Univ.

  19. Design Impact of Interconnect (non) Scaling • Repeater-driven interconnect is energy, congestion, performance-limited • Maximum reachable distance in a clock cycle = ? (low-swing, differential, …) • Bandwidth vs. latency envelope = ? (encoding, power, signal reliability, …) • Latency is not the only problem: temperature, power density and EM Temperature of global interconnect rises with low-k  performance impact • Future NoC interconnections should address performance/thermal/reliability issues at fabric design, and design optimization phases • This work  search for optimal NoC interconnect stack parameters

  20. New Directions for System-Level Interconnects • Wire pipelining, state-aware repeaters • Methodologies ? • Globally asynchronous, locally synchronous • Latency insensitive • Design paradigm shift from “computation-driven” to “communication-driven” •  Computation is no longer the bottleneck • Computation is cheap  exploit computation infrastructure to develop efficient communication mechanisms • Designs transforming into distributed systems • Interconnection network performance key for system performance  power, bandwidth, and throughput envelope constrained by elements in the system

  21. Design-Centric Modeling of Interconnects • Traditional modeling techniques consider individual wires for characterization / optimization of interconnect performance metrics •  no notion of design specificity • Multi-core / NoC / communication-system design exploration and synthesis methodologies should consider interconnect fabric in the context of design  design-centric modeling of interconnects • Modeling design fabrics: • Design fabrics for communication-based design: nodes, interconnect • Global interconnect of data path elements, processor cores • Point-to-point/broadcast buses, links, switch/multiplexer interfaces, and routers • Metrics for performance analysis of design fabrics • Conventional design metrics: performance, power, area • New metrics needed • Should reflect system-level power/performance characteristics

  22. Interconnect Metrics (IM) • Traditional interconnect performance metrics • Signaling: Delay, power, bandwidth, noise, crosstalk, area • Clocking: Skew/jitter, power, slew rate, area • Power distribution: supply fidelity • Reliability: electromigration • Some recent metrics • Interconnect architecture rank: inclusive metric combining delay, routability, area • Bandwidth/Energy: signifies throughput as well as energy spent in signaling with a specific bandwidth • Problems with existing metrics: • No notion of design specificity  interconnect stack performance is heavily dependent on a design’s wire length distribution • IM optimization based on canonical test structures is not valid for all wiring topologies  sub-optimal results

  23. BEOL Stack Metrics • Design-centric BEOL interconnect stack architectures  global interconnect topologies for NoC • Macro-block configurations may vary in # of wires, geometric parameters (width, spacing) and link structure • Stack metrics: • Traditional metrics can be adapted to macro-blocks • New metrics: area-normalized throughput, power density Macro blocks Bus – (1) Curves – (2) Cross w/o contacts – (3) Cross w/ contacts – (4)-(6)  Interconnect library Source: Addino et al.

  24. Recall: TI Request for BEOL Stack Optimization • WANTED: BEOL stack optimization tool (Nagaraj, TI) • Inputs: • Stack options: thickness, pitch, dielectric materials, process variations • Class of representative designs at RT-level: logic-only, logic+memory, datapaths, CPU cores, SoC • Cell and IP library • Outputs: • Concise summary of tradeoffs for different BEOL stack options • Area, clock and power distribution (on die and package), performance, reliability, cost

  25. BEOL Stack Optimization • Stack optimization: search for the best set of macro block parameters which yield optimum points for performance metrics • Methodology • Step 1: Construction of interconnect library • Step 2: Electrical characterization of library elements for different choices of geometric, user-specified parameters(Addino et al, PATMOS’03) • Step 3: Computation of performance metrics • Bit transfer rate across cross-section of the elements • Power density per unit area • “Traditional metrics” – latency, bandwidth, noise etc. • Best solutions for different performance metrics may be mutually conflicting  intelligent search in parameter space to obtain optimal solution

  26. Interconnect Characterization for Communication-based Design • BEOL stack exploration: initial step toward interconnect fabric design and optimization for MPSoC, CMPs and heterogeneous systems  best interconnect stack for specific communication objective • Novel interconnect metrics • Capture technology scaling • Capture system scaling (design constraints): consider impact of memory hierarchy, interface timing, power, signal swing levels • Interconnect characterization: create models of performance metrics for interconnect structures • E.g. Which structure gives the best throughput per area for a given performance constraints ? • How does power density change with bus parameters, power constraints ? • Probabilistic, continuum/hierarchy of models • Dial effort/information vs. accuracy • “N+1  N+2 shrink”; “Side + Ngate + Rent p”; “run Architecture Compiler”; … • Dial guardband vs. certainty

  27. Conclusions • Existing BEOL stack analysis/optimization oblivious of system design constraints • Individual wires no longer sufficient for performance analysis  move to higher levels of abstraction • Communication-driven design synthesis paradigm drives system-level interconnect analysis • Standalone metrics (e.g., delay, power, bandwidth) cannot give complete picture of performance • new metrics: area-normalized throughput, power density • Explore parameter space to efficiently obtain stack parameters for optimum performance