High Performance, Multi-CPU Power Signoff for Mega Designs

High Performance, Multi-CPU Power Signoff for Mega Designs Patrick Sproule Director of Engineering, VLSI Methodology

Nvidia Power Analysis Requirements • Static and Dynamic Full Chip Power Analysis • Tool implementation must handle both sub-chip analysis or full die analysis in a single sessions. • Ideally provide full domainanalysis for full accuracy in a single run. • Design Size Scalability • Full flat design analysis to handle both small and largest production designs on existing/available compute resource. • Runtime Predictability • Designs get larger but schedule time for power analysis is required to stay constant or shrink. Required close ended runtime estimates. • Clear Reporting • Large amount of analysis data must be condensed to clear reports.

Power Analysis Challenges • Designs have seen device count grow by 4 orders of magnitude in less than 10 years. • Increased number of metal layers and modelled device count cause calculation to expand faster than tools and compute resources. • Large runtimes and/or inefficient subdivision of designs required. • Designs have also become highly replicated at a multitude of hierarchy levels. • Complexity of data handling and integration within the tools. • Many engineer run analysis at different hierarchy levels. • Recreation of db and duplication of analysis costs schedule.

Current Rail Analysis Methodology • Partition-based hierarchical methodology is planned and executed within a large design team at many levels • Unique design technologies, especially in low power • Multi-power domains, power gating switches, … Full Chip Integration Full Chip Chiplet Owners chiplet Partition Owners partition

Typical Extraction and Rail Analysis • Rail Analysis • Power-Grid-View (PGV): physical modeling of IP • Current Signatures • Extraction • Rail: RC, current, geometry Primitive PGV RC Extraction PGDB Physical Database Rail Analysis Current Signatures IR Drop Results/Plots

Hierarchical Rail Analysis Method (H-PGV) RC Extraction PGDB Top-Level Database Partition 1 H-PGV 1 RC Extraction … … Partition N H-PGV N Current Signatures Primitive PGV Rail Analysis IR Drop Results/Plots

H-PGV Advantages • H-PGV generation runtime is minimal compared to full chip database setup for IR-drop analysis • H-PGVs can be generated in parallel • Hierarchical methodology supports bottom-up and top-down rail analysis. • Capturing H-PGV boundary condition for ECO at partition level (top down push) • Full and Sub-chip level analysis time greatly improved with same accuracy

Flat vs. Hierarchical Correlation • Example Analysis: Sub-chip level • 14.4M total primitive instance count (modelled cells) • 8.9M regular logic and memory cells • 5.5M filler, tap, decap cells • 18 total partitions in chiplet • 7 unique partitions • 3 partitions replicated 4 time each. • H-PGV run metrics : • Runtime : 18~32 minutes • Memory : 40~45G

Rail Analysis at Full Chip Level

Nvidia Scale and Runtime Issues • Design Size Growth outpacing tool and resource capability.

Voltuson Kepler • ~380M instances flat analysis – tsmc28nm • Main resource: • ~725Gb memory on 1Tb 32 cpu machine. • Static and Dynamic Signoff Power analysis at VDD & VSS (done as parallel runs) • 21 hour runtime per analysis domain. • ~8x runtime improvement over previous method with equivalent accuracy.

Rail Analysis at Full Chip Level

Nvidia Scale and Runtime Issues Memory requirement

Summary • Voltus meets our needs for Rail analysis with accuracy and runtime with far less than expected runtimes. • Further testing proved possible to run VDD-GND combined domain in a single pass in 50 hrs runtime using multi-threaded and distributed capabilities. • Capability to run both multi-threaded and distributed allows us the flexibility to manage schedule and resource requirements. • Congratulations to the Voltus team on delivering a distruptive runtime improvement.

Q&A

High Performance, Multi-CPU Power Signoff for Mega Designs

High Performance, Multi-CPU Power Signoff for Mega Designs

Presentation Transcript

COMP25212 CPU Multi Threading

Performance Designs

High-Performance Computing from Smart Phone, Multi-core CPU to Graphics Processing Unit

High-Performance Gate Sizing with a Signoff Timer

High-Performance, Power-Aware Computing

CPU Performance Pipelined CPU

Lecture 4: CPU Performance

Multi Cycle CPU

High Performance Power Plants

CPU Performance

High-Performance Low-Power Electronics for SMAMID

CPU power consumption

A High-performance Multi-perspective Vision Studio

COMP25212 CPU Multi Threading

Buy High Power Air Rifle For Better Performance

3 Designs Of High Performance Exhaust Mufflers

Multi-CPU Video Processing

High-Performance, Power-Aware Computing