1 / 15

High Performance, Multi-CPU Power Signoff for Mega Designs

High Performance, Multi-CPU Power Signoff for Mega Designs. Patrick Sproule Director of Engineering, VLSI Methodology. Nvidia Power Analysis Requirements. Static and Dynamic Full Chip Power Analysis

andren
Download Presentation

High Performance, Multi-CPU Power Signoff for Mega Designs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance, Multi-CPU Power Signoff for Mega Designs Patrick Sproule Director of Engineering, VLSI Methodology

  2. Nvidia Power Analysis Requirements • Static and Dynamic Full Chip Power Analysis • Tool implementation must handle both sub-chip analysis or full die analysis in a single sessions. • Ideally provide full domainanalysis for full accuracy in a single run. • Design Size Scalability • Full flat design analysis to handle both small and largest production designs on existing/available compute resource. • Runtime Predictability • Designs get larger but schedule time for power analysis is required to stay constant or shrink. Required close ended runtime estimates. • Clear Reporting • Large amount of analysis data must be condensed to clear reports.

  3. Power Analysis Challenges • Designs have seen device count grow by 4 orders of magnitude in less than 10 years. • Increased number of metal layers and modelled device count cause calculation to expand faster than tools and compute resources. • Large runtimes and/or inefficient subdivision of designs required. • Designs have also become highly replicated at a multitude of hierarchy levels. • Complexity of data handling and integration within the tools. • Many engineer run analysis at different hierarchy levels. • Recreation of db and duplication of analysis costs schedule.

  4. Current Rail Analysis Methodology • Partition-based hierarchical methodology is planned and executed within a large design team at many levels • Unique design technologies, especially in low power • Multi-power domains, power gating switches, … Full Chip Integration Full Chip Chiplet Owners chiplet Partition Owners partition

  5. Typical Extraction and Rail Analysis • Rail Analysis • Power-Grid-View (PGV): physical modeling of IP • Current Signatures • Extraction • Rail: RC, current, geometry Primitive PGV RC Extraction PGDB Physical Database Rail Analysis Current Signatures IR Drop Results/Plots

  6. Hierarchical Rail Analysis Method (H-PGV) RC Extraction PGDB Top-Level Database Partition 1 H-PGV 1 RC Extraction … … Partition N H-PGV N Current Signatures Primitive PGV Rail Analysis IR Drop Results/Plots

  7. H-PGV Advantages • H-PGV generation runtime is minimal compared to full chip database setup for IR-drop analysis • H-PGVs can be generated in parallel • Hierarchical methodology supports bottom-up and top-down rail analysis. • Capturing H-PGV boundary condition for ECO at partition level (top down push) • Full and Sub-chip level analysis time greatly improved with same accuracy

  8. Flat vs. Hierarchical Correlation • Example Analysis: Sub-chip level • 14.4M total primitive instance count (modelled cells) • 8.9M regular logic and memory cells • 5.5M filler, tap, decap cells • 18 total partitions in chiplet • 7 unique partitions • 3 partitions replicated 4 time each. • H-PGV run metrics : • Runtime : 18~32 minutes • Memory : 40~45G

  9. Rail Analysis at Full Chip Level

  10. Nvidia Scale and Runtime Issues • Design Size Growth outpacing tool and resource capability.

  11. Voltuson Kepler • ~380M instances flat analysis – tsmc28nm • Main resource: • ~725Gb memory on 1Tb 32 cpu machine. • Static and Dynamic Signoff Power analysis at VDD & VSS (done as parallel runs) • 21 hour runtime per analysis domain. • ~8x runtime improvement over previous method with equivalent accuracy.

  12. Rail Analysis at Full Chip Level

  13. Nvidia Scale and Runtime Issues Memory requirement

  14. Summary • Voltus meets our needs for Rail analysis with accuracy and runtime with far less than expected runtimes. • Further testing proved possible to run VDD-GND combined domain in a single pass in 50 hrs runtime using multi-threaded and distributed capabilities. • Capability to run both multi-threaded and distributed allows us the flexibility to manage schedule and resource requirements. • Congratulations to the Voltus team on delivering a distruptive runtime improvement.

  15. Q&A

More Related