980 likes | 1.14k Views
Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000 abk@cs.ucla.edu, http://vlsicad.cs.ucla.edu. My Research. Applied algorithmics demonstrably useful solutions for real problems
E N D
Valuation and Values in Application-Driven Algorithmics: Case Studies from VLSI CAD Andrew B. Kahng, UCLA Computer Science Dept. June 2, 2000abk@cs.ucla.edu, http://vlsicad.cs.ucla.edu
My Research • Applied algorithmics • demonstrably useful solutions for real problems • “best known” solutions • “classic” (well-studied) : Steiner, partition, placement, TSP,... • toolkits: discrete algorithms, global optimization, mathematical programming, approximation frameworks, new-age metaheuristics, engineering • “Ground truths” • anatomies • limits
Anatomies • Technologies • semiconductor process roadmap, design-manufacturing I/F • design technology: methodology, flows, design process • interconnect modeling/analysis: delay/noise est, compact models • Problems • structural theory of large-scale global optimizations • Heuristics • hypergraph partitioning and clustering • wirelength- and timing-driven placement • single/multiple topology synthesis (length, delay, skew, buffering,...) • TSP, ..., IP protection, ..., combinatorial exchange/auction, ... • Cultures • contexts and infrastructure for research and technology transfer
Bounds • Exact methods • Provable approximations • Technology extrapolation • achievable envelope of system implementation w.r.t. cost, speed, power, reliability, ... • ideally, should drive and be driven by system architectures, design and implementation methodologies
Today’s Talk • “Demonstrably useful solutions for real problems” • “Valuation”: What problems require attention ? • technology extrapolation • automatic layout of phase-shifting masks • “Values”: How do we advance the leading edge ? • anatomy of FM-based hypergraph partitioning heuristics • culture change: restoring time-to-market and QOR in applied algorithmics via “IP reuse”
Today’s Talk • “Demonstrably useful solutions for real problems” • “Valuation”: What problems require attention ? • technology extrapolation • automatic layout of phase-shifting masks • “Values”: How do we advance the leading edge ? • anatomy of FM-based hypergraph partitioning heuristics • culture change: restoring time-to-market and QOR in applied algorithmics via “IP reuse”
Technology Extrapolation What is the most power-efficient noise management strategy? • Evaluates impact of • design technology • process technology • Evaluates impact on • achievable design • associated design problems • What matters, when ? • Sets new requirements for CAD tools and methodologies, capital and R&D investment, ... right tech at the right time • Roadmaps (SIA ITRS): familiar and influential example How and when do L, SOI, SER, etc. matter? Will layout tools need to perform process simulation to effectively address cross-die and cross-wafer manufacturing variation?
GTX Knowledge User inputs Implementation Parameters (data) Engine (derivation) Pre-packaged Rules (models) GUI (presentation) Rule chain (study) GTX: GSRC Technology Extrapolation System • GTX is a framework for technology extrapolation
Graphical User Interface (GUI) • Provides user interaction • Visualization (plotting, printing, saving to file) • 4 views: • Parameters • Rules • Rule chain • Values in chain
GTX: Open, “Living Roadmap” • Openness in grammar, parameters and rules • easy sharing of data, models in research environment • contributions of best known models from anywhere • Allows development of proprietary models • separation between supplied (shared) and user-defined parameters / rules • usability behind firewalls • functionality for sharing results instead of data • Multi-platform (SUN Solaris, Windows, Linux) • http://vlsicad.cs.ucla.edu/GSRC/GTX/
GTX Activity • Models implemented • Cycle-time models of SUSPENS (with extension by Takahashi), BACPAC (Sylvester, Berkeley), Fisher (ITRS) • Currently adding • GENESYS (with help from Georgia Tech) • RIPE (with help from RPI) • New device and power modules (Synopsys / Berkeley) • New SOI device model (Synopsys / Berkeley) • Inductance extraction (Silicon Graphics / Berkeley / Synopsys) • Studies performed in GTX • Modeling and parameter sensitivity analyses • Design optimization studies: global interconnects, layer stack • Routability estimation, via impact models, ...
Today’s Talk • “Demonstrably useful solutions for real problems” • “Valuation”: What problems require attention ? • technology extrapolation • automatic layout of phase-shifting masks • “Values”: How do we advance the leading edge ? • anatomy of FM-based hypergraph partitioning heuristics • culture change: restoring time-to-market and QOR in applied algorithmics via “IP reuse”
Subwavelength Optical Lithography Subwavelength Gap since .35 m • EUV, X-rays, E-beams all > 10 years out • huge investment in > 30 years of optical litho infrastructure
Bright Field opaque features transparent background Dark Field transparent features opaque background Clear areas Opaque (chrome) areas Mask Types
conventional mask phase shifting mask glass Chrome Phase shifter 0 E at mask 0 0 E at wafer 0 0 I at wafer 0 Phase Shifting Masks
Impact of PSM • PSM enables smaller transistor gate lengths Leff • “critical” polysilicon features only (gate Leff) • faster device switching ® faster circuits • better critical dimension (CD) control ® improved parametric yield • all features on polysilicon layer, local interconnect layers • smaller die area ® more $/wafer (“full-chip PSM” == BIG win) • Alternative: build a $10B fab with equipment that won’t exist for 5+ years • Data points • exponential increase in price of CAD technology for PSM • Numerical Technologies market cap 3x that of Avant! • 25 nm gates (!!!) manufactured with 248nm DUV steppers (NTI + MIT Lincoln Labs, announced 2 days ago); 90nm gates in production at Motorola, Lucent (since late 1999)
Double-Exposure Bright-Field PSM 0 + = 180 180
The Phase Assignment Problem • Assign 0, 180 phase regions such that critical features with width (separation) < B are induced by adjacent phase regions with opposite phases Bright Field (Dark Field) 180 0 180 0
Key: Global 2-Colorability • If there is an odd cycle of “phase implications” ® layout cannot be manufactured • layout verification becomes a global, not local, issue ? 180 0 180 180 0 180
Critical features: F1,F2,F3,F4 F2 F4 F1 F3
F2 F4 F1 F3 Opposite-Phase Shifters (0,180)
S3 F2 S4 S8 F4 S7 S1 F1 S2 S5 F3 S6 Shifters: S1-S8 PROPER Phase Assignment: • Oppositephases for opposite shifters • Same phase for overlapping shifters
S3 F2 S4 S8 F4 S7 S1 F1 S2 S5 F3 S6 Phase Conflict Proper Phase Assignment is IMPOSSIBLE
Phase Conflict Resolution S3 F2 S4 S8 F4 S7 S1 F1 S2 S5 F3 S6 Phase Conflict feature shifting to remove overlap
Phase Conflict Resolution S3 F2 S4 S8 F4 S7 S1 F1 S2 F3 Phase Conflict feature widening to turn conflict into non-conflict
How will VLSI CAD deal with PSM ? • UCLA: first comprehensive methodology for PSM-aware layout design • currently being integrated by Cadence, Numerical Technologies • Approach: partition responsibility for phase-assignability • good layout practices (local geometry) • (open) problem: is there a set of “design rules” that guarantees phase-assignability of layout ? (no T’s, no doglegs, even fingers...) • automatic phase conflict resolution / bipartization (global colorability) • enabling reuse of layout (free composability) • problem: how can we guarantee reusability of phase-assigned layouts, such that no odd cycles can occur when the layouts are composed together in a larger layout ?
Compaction-Oriented Approach • Analyze input layout • Find min-cost set of perturbations needed to eliminate all “odd cycles” • Induce constraints for output layout • i.e., PSM-induced (shape, spacing) constraints • Compact to get phase-assignable layout • Key: Minimize the set of new constraints, i.e., break all odd cycles in conflict graph by deleting a minimum number of edges.
Conflict Graph • Dark Field: build graph over feature regions • edge between two features whose separation is < B • Bright Field: build graph over shifter regions • shifters for features whose width is < B • two edge types • adjacency edge between overlapping phase regions : endpoints must have same phase • conflict edge between shifters on opposite side of critical feature: endpoints must have opposite phase
Conflict Graph G green = feature; pink = conflict Dark Field: conflict graph G Bright Field: conflict edge conflict graph G adjacency edge
Optimal Odd Cycle Elimination dark green = feature; pink = conflict conflict graph G dual graph D T-join of odd-degree nodes in D
Optimal Odd Cycle Elimination dark green = feature; pink = conflict - assign phases: dark green and purple - remaining pink conflicts correctly handled corresponds to broken edges in original conflict graph T-join of odd-degree nodes in D
The T-join Problem • How to delete minimum-cost set of edges from conflict graph G to eliminate odd cycles? • Construct geometric dual graph D = dual(G) • Find odd-degree vertices T in D • Solve the T-join problem in D: • find min-weight edge set J in D such that • all T-vertices have odd degree • all other vertices have even degree • Solution J corresponds to desired min-cost edge set in conflict graph G
Solving T-join in Sparse Graphs • Reduction to matching • construct a complete graph T(G) • vertices = T-vertices • edge costs = shortest-path cost • find minimum-cost perfect matching • Typical example = sparse (not always planar) graph • note that conflict graphs are sparse • #vertices = 1,000,000 • #edges 5 #vertices • # T-vertices 10% of #vertices = 100,000 • Drawback: finding APSP too slow, memory-consuming • #vertices = 100,000 ® #edges in T(G) = 5,000,000,000
Solving T-join: Reduction to Matching • Desirable properties of reduction to matching: • exact (i.e., optimal) • not much memory (say, 2-3X more) • leads to very fast solution • Solution: gadgets! • replace each edge/vertex with gadgets s.t. matching all vertices in gadgeted graph Û T-join in original graph
T-join Problem: Reduction to Matching • replace each vertex with a chain of triangles • one more edge for T-vertices • in graph D: m = #edges, n = #vertices, t = #T • in gadgeted graph: 4m-2n-t vertices, 7m-5n-t edges • cost of red edges = original dual edge costs cost of (black) edges in triangles = 0 vertex Î T vertex T
Example of Gadgeted Graph Gadgetedgraph DualGraph black + red edges == min-cost perfect matching
Results • Runtimes in CPU seconds on Sun Ultra-10 • Greedy = breadth-first-search bicoloring • GW = Goemans/Williamson95 heuristic • Cook/Rohe98 for perfect matching • Integration w/compactor: saves 9+% layout area vs. GW
S3 F2 S4 S8 F4 S7 S1 F1 S2 S5 F3 S6 Can distinguish between use of shifting, widening DOFs
Black points - features Blue - shifter overlap Red - extra nodes to distinguish opposite shifters Bipartization Problem: delete min # of nodes (or edges) to make graph bipartite - blue nodes: shifting - red nodes: widening Bipartization by node deletion is NP-hard (GW98: 9/4-approx)
Summary • New fast, optimal algorithms for edge-deletion bipartization • Fast T-join using gadgets • applicable to any AltPSM phase conflict graphs • Approximate solution for node-deletion bipartization • Goemans-Williamson98 9/4-approximation • If node-deletion cost < 1.5 edge deletion, GW is better than edge deletion • Comprehensive integration w/NTI, Cadence tools
Today’s Talk • “Demonstrably useful solutions for real problems” • “Valuation”: What problems require attention ? • technology extrapolation • automatic layout of phase-shifting masks • “Values”: How do we advance the leading edge ? • anatomy of FM-based hypergraph partitioning heuristics • culture change: restoring time-to-market and QOR in applied algorithmics via “IP reuse”
Applied Algorithmics R&D • Heuristics for hard problems • Problems have practical context • Choices dominated by engineering tradeoffs • QOR vs. resource usage, accessibility, adoptability • How do you know/show that your approach is good?
Hypergraphs in VLSI CAD • Circuit netlist represented by hypergraph
Hypergraph Partitioning in VLSI • Variants • directed/undirected hypergraphs • weighted/unweighted vertices, edges • constraints, objectives, … • Human-designed instances • Benchmarks • up to 4,000,000 vertices • sparse (vertex degree » 4, hyperedge size » 4) • small number of very large hyperedges • Efficiency, flexibility: KL-FM style preferred
Context: Top-Down Placement • Speed • 6,000 cells/minute to final detailed placement • partitioning used only in top-down global placement • implied partitioning runtime: 1 second for 25,000 cells, < 30 seconds for 750,000 cells • Structure • tight balance constraint on total cell areas in partitions • widely varying cell areas • fixed terminals (pads, terminal propagation, etc.)
Fiduccia-Mattheyses (FM) Approach • Pass: • start with all vertices free to move (unlocked) • label each possible move with immediate change in cost that it causes (gain) • iteratively select and execute a move with highest gain, lock the moving vertex (i.e., cannot move again during the pass), and update affected gains • best solution seen during the pass is adopted as starting solution for next pass • FM: • start with some initial solution • perform passes until a pass fails to improve solution quality
Cut During One Pass (Bipartitioning) Cut Moves
Multilevel Partitioning Clustering Refinement