1 / 26

The Scaling Challenge: Can Correct-by-Construction Design Help?

The Scaling Challenge: Can Correct-by-Construction Design Help?. Prashant Saxena Noel Menezes Pasquale Cocchini Desmond Kirkpatrick Intel Labs (CAD Research) Hillsboro OR International Symposium on Physical Design Monterey, CA Apr 16, 2003.

erica
Download Presentation

The Scaling Challenge: Can Correct-by-Construction Design Help?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Scaling Challenge:Can Correct-by-Construction Design Help? Prashant Saxena Noel Menezes Pasquale Cocchini Desmond Kirkpatrick Intel Labs (CAD Research) Hillsboro OR International Symposium on Physical Design Monterey, CA Apr 16, 2003

  2. Repeaters, which are already a full-chip headache, will become critical at the block level also

  3. Outline • Some scaling experiments • Spice simulations • Implications for post-RTL design • Correct-by-Construction (CbC) design • What’s the promise? What’s missing?

  4. G S D A Scaling Primer • Process scaling: • Devices shrink 0.7x, delay 0.7x • Wires shrink 0.7x • R/m increases 2x, C/m unchanged • So, (delay/scaled m) increases 1.4x • Block area often stays same • # cells, # nets doubles • Wiring histogram shape invariant

  5. 0.57x In line with scaling theory: Critical Repeater Lengths • Optimally-sized uniformly for min delay • Min distance at which inserting a repeater speeds up the line • “Ideally shrunk” circuit requires additional repeaters (0.7x vs 0.57x)

  6. 0.43x Critical Sequential Lengths • Optimized for max distance in one clock period • Assumes: • 2x frequency scaling, 5GHz on 90nm • Ignores setup, hold, skew • “Ideally shrunk” circuit: • Requires much new wire pipelining(0.7x vs 0.43x) • Ratio of regular to clocked repeaters decreasing 0.75x

  7. 100000 100000 Process Metal 90nm 10000 10000 M6 65nm M3 45nm 32nm 1000 1000 # Wires (90nm) 100 100 10 10 1 1 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 55 55 60 60 65 65 70 70 75 75 80 80 85 85 90 90 0.25 0.25 Normalized wirelength Block Wiring Histogram and Critical Repeater Lengths # Wires (90nm) Normalized wirelength Critical lengths migrating rapidly to the left… (zoomed view coming up)

  8. Critical Repeater Lengths 100000 Process Metal 90nm M6 65nm M3 10000 45nm 32nm 1000 # wires (90nm) 100 10 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Normalized Wirelength Block Wiring Histogram: Zoomed View # Wires (90nm) Normalized Wirelength Increasingly steep slope of curve (log scale) => # impacted nets exploding!

  9. PSC/bus1p Wiring Histogram PSC/bus1p Wiring Histogram PSC/bus1p Wiring Histogram PSC/bus1p Wiring Histogram Critical Sequential Distances Critical Sequential Distances Critical Sequential Distances Critical Sequential Distances 100000 100000 100000 100000 Process Process Metal Metal 90nm 90nm 90nm M6 M6 M6 10000 10000 10000 10000 65nm 65nm 65nm M3 M3 M3 45nm 45nm 45nm 32nm 32nm 32nm 1000 1000 1000 1000 #wires (90nm) #wires (90nm) #wires (90nm) #wires (90nm) #wires (90nm) #wires (90nm) #wires (90nm) 100 100 100 100 10 10 10 10 1 1 1 1 5 5 5 5 10 10 10 10 15 15 15 15 20 20 20 20 25 25 25 25 30 30 30 30 35 35 35 35 40 40 40 40 45 45 45 45 50 50 50 50 55 55 55 55 60 60 60 60 65 65 65 65 70 70 70 70 75 75 75 75 80 80 80 80 85 85 85 85 90 90 90 90 0.25 0.25 0.25 0.25 Normalized Normalized Normalized Normalized Wirelength Wirelength Wirelength Wirelength Normalized Normalized Normalized Wirelength Wirelength Wirelength Block Wiring Histogram and Critical Sequential Lengths Process Process Metal Metal #wires (90nm) #wires (90nm) #wires (90nm) Normalized Wirelength Normalized Normalized Wirelength Wirelength # pipelined nets growing from negligible (90nm) to substantial (32nm)

  10. Repeated Block-level Nets Ever-increasing %age of block-level nets requires repeaters Even the rate of growth is accelerating! …especially for clocked repeaters

  11. Total Repeater Count • Ever-increasing fractions of total cell count will be repeaters • 70% in 32nm(and this omits FC repeaters within block !) Total repeater count is independent of frequency scaling assumptions

  12. So, what’s changing? • Interconnects scaling worse than devices ….in spite of optimal (re-)buffering • # repeaters increasing exponentially Interconnect repeaters will comprise significant fraction of cells in block Even block-level nets will need to be pipelined

  13. Implications on Synthesis • Literal/Gate count and fanout metrics misleading • Major delay contribution from communication • Fanouts often isolated by repeaters • Area often wire-limited • Sizing often determined by (predictable) repeater load • Pre-layout sizing wasted

  14. Implications on Synthesis • Less logic per pipeline stage • Combinational synthesis: max benefit shrinking • Synthesis across sequential boundaries • Methodological support for retiming

  15. Implications on Synthesis • Bandwidth ceiling • Hard to move data around for computation • Logic replication • Encourage low fans • Dense encodings • Distribution of computation across channel

  16. b b b a a a Implications on Layout • Routing • Must understand repeater insertion • Fine power grid => templated routing? • Placement with repeaters • Intra-block nets: # repeaters depends on routing • OTH routes: fixed obstructions • Add buffering into placement core … as opposed to ECO postprocessing

  17. 90nm 32nm Implications on Layout • Latency-constrained placement • march sub-optimality • Hard constraint per stage (unlike delay) OR • Post-RTL latency optimization • Methodological nightmare • Delay insensitive design?

  18. Implications on FC Assembly What if we reduce block area to avoid wire effects? Many of the new physical synthesis problems go away BUT # blocks triples! (and block assembly is the hardest part of chip design!) • Flat assembly (Fragmentation of paths across blocks) OR • Increased hierarchy (Lack of visibility across hierarchy levels)

  19. The CbC Link Process scaling => worsening predictability Predictability => CbC design But current CbC approaches too rigid Can we still apply them?

  20. Principles of CbC Design • More predictability • Reduced estimation error improves high-level optimizations • Break the design-verification loop • Sequence of small, guaranteed-correct transformations • No unexpected deterioration of secondary metrics • Avoid micro-engineering • Design productivity gap

  21. Abstract Fabrics • Structural fabrics: too resource-intensive e.g. DWF: 50% routing tracks • Use algorithmic fabrics instead • Prune to subspace with desirable CbC properties e.g. Non-uniform power grid using “min power pitch” (ISPD’02) Guaranteed throughput bus design (ICCAD’02) • CbC rules-of-thumb e.g. Bound on max adjacent runs of signals Performance with predictability

  22. RTL Synth/mapped netlist Placed/buffered netlist GR/track-assigned layout CbC Block Construction • “Vertical” partitioning and successive refinement • Coarse layout of unsynthesized design • Successive refinement of “vertical” partitions • Critical partitions first • Different partitions exist at different level of refinement • Hierarchical engines • Enables early repeater prediction

  23. CbC Full Chip Assembly • Latency prediction for full-chip interconnects • Preferential routing for performance-critical nets • Flip-flop staging on non-critical nets • Performance prediction with cycle latency ranges • Block area mis-prediction tolerance • Move blocks without re-implementation • Global communication grids

  24. Summing Up • Repeaters becoming critical at the block level • Most post-RTL design problems changing fundamentally • Combination of algorithmic and methodological advances required • CbC approaches viable, but at the abstract level • Current structural fabrics too resource intensive • Achieve predictability through algorithmic fabrics

  25. Backup Slides

  26. PIE (Process Independent Exploration) Models • To provide an easier way to study interconnect structures and their trends in future CMOS processes • To be used in place of fudged process files • Analytical models directly correlating to device and interconnect physics • Device models based on BSIM3 equations including major 2nd order effects • Accurate mobility and velocity saturation models, DIBL and channel length modulation approximation • Continuous from weak to strong inversion • Interconnect models with 2D fringe capacitance approximation • Scattering not accounted for • Entire process expressed by small set of physically meaningful process parameters (e.g. Tox, Vth, kild, etc.) in PEF (Process Exploration File) files • 16 for devices • 6 each metal layer • Test cases simulated as SPICE netlists • PIE models implemented as behavioral sources • Calibrated against existing process files

More Related