190 likes | 561 Views
Inductor Design for Global Resonant Clock Distribution in a 28-nm CMOS Processor. Visvesh Sathe 3 , Padelis Papadopoulos 2 , Alvin Loke 3 , Tarek Khan 1 , Anand Raman 2 , Gerry Vandevalk 3 , Nikolas Provatas 2 , Vincent Ross 1 1 Advanced Micro Devices, Inc. 2 Helic, Inc.
E N D
Inductor Design for Global Resonant Clock Distribution in a 28-nm CMOS Processor Visvesh Sathe3, Padelis Papadopoulos2, Alvin Loke3, Tarek Khan1, Anand Raman2, Gerry Vandevalk3, Nikolas Provatas2, Vincent Ross1 1Advanced Micro Devices, Inc. 2Helic, Inc. 3 Formerly at Advanced Micro Devices, Inc.
Outline Resonant Clock Distribution Inductor Design and Analysis Challenges Helic VeloceRaptor/X Inductor Extraction using VeloceRaptor/X Silicon Correlation Conclusion
Processor Global Clock Distribution AMD “Piledriver” • Typical core-power breakdown consumption macros 18% flops 18% gaters 16% standard cells 19% bus 5% clocking 24% • Significant global clock loading • 7-ps clock skew target across > 20-mm2 core area • Constrained clock latency from grid to timing elements
Basic Resonant Clocking Operation • Rely on efficient resonance between Ltank and Cclk near ω0 • Efficient operation around ω0 • Driving clock at much lower frequencies Reduced efficiency, warped clock waveform
AMD Resonant Clocking • 90 inductors distributed over custom power grid, signal wires, and core circuitry
Inductor Design Clock macro, bump pitch constrain inductor size Metal sharing with existing power →cut-aways Centered power straps, HCK tree for mutual inductance
Inductor and Grid Problem Summary • 87 x65 μm spiral over 113 x126 μm custom grid • 12 metal layers (2 thick) • Width: 0.13to 5.7 μm • Thickness: 0.1 to 1.2 μm • >5μm/μm2 interconnect length to be extracted!
Inductor Design Methodology • Goal: Achieve desired L with maximum Q on a highly customized inductor • Available design variables • Winding width, outer spacing, inner spacing (NESW) • Winding height, winding width • Multiple extractions within reasonable time is vital • Extraction customization per-metal is crucial • Top metal layers dominate magnetic interaction, lower level metals have minimal interaction • Per-metal extraction/merging mode selection (R/C/RC/RLC/RLCk) • Process-aware, temperature-sensitive extraction
What is VeloceRaptor/X ? Rapid, high-capacity multi-GHz EM extraction • Maxwell equations-based RLCk model per metal segment • Inductance calculations based on magnetic vector potential • Skin and proximity effects, substrate losses, capacitive and magnetic coupling • Silicon-proven accuracy • Use model: • In situ selection of nets and pin definition • Netlist and symbol creation for the marked nets • Model annotation and simulation
VeloceRaptor/X Offers… • High capacity and speed • Multithreading support • S-parameters and RLCk netlist output • Temperature-aware model • Mixed-mode R/C/RC/RLC/RLCk per any net layer • Layout-dependent effects captured • Direct GDS extraction • Batch-mode support • Numerical network reduction
Inductor-over-Grid Model Validation Best tradeoff between model accuracy and runtime/memory requirements Increasing interconnect density, runtime, memory requirement No improvement in model accuracy when adding more RLCk layers • Mixed-mode extraction per net layer: • M11- Mx: RLCk • Mx-1- M3: RC • RLCk extraction below M07 has negligible impact
Test Chip Silicon Validation Very good agreement between measured and extracted L and Q
Conclusions • Resonant clocking feature reduces global clock power distribution • Use of multiple distributed on-chip inductors poses a significant challenge to inductor extraction • Metal-rich extraction environment • Significant mutual inductance with underlying and adjacent circuits and power grids • Exploiting design structure and VeloceRaptor/X capabilities enabled efficient inductor optimization • Batch mode and per-metal per-net extraction for extraction of a model with sufficient detail to accurately model silicon behavior.