610 likes | 826 Views
Finding and Sharing Brick Walls CANDE September 22, 2001 Andrew B. Kahng, UCSD CSE & ECE Departments email: abk@ucsd.edu URL: http://vlsicad.ucsd.edu. Year Technology Node. 1999 180 nm. 2000. 2001. 2002 130 nm. 2003. 2004. 2005 100 nm. MPU new design cycle (months). 36. 36.
E N D
Finding and Sharing Brick WallsCANDESeptember 22, 2001Andrew B. Kahng, UCSD CSE & ECE Departmentsemail: abk@ucsd.eduURL: http://vlsicad.ucsd.edu
YearTechnology Node 1999180 nm 2000 2001 2002130 nm 2003 2004 2005100 nm MPU new design cycle (months) 36 36 36 32 32 32 30 MPU transistors per designer-month (300-person team) (thousand) 2 3 4 7 10 15 20 ASIC new design cycle (months) 12 12 12 12 12 12 12 ASIC transistors per designer-month (50-person team) (million) 0.3 0.4 0.5 0.7 1.0 1.3 1.8 Portion of verification by formal methods 15% 15% 15% 20% 20% 20% 30% Portion of test covered by BIST 20% 20% 20% 30% 30% 30% 40% 1999 ITRS Design Technology Metrics and Red Bricks Solutions Exist Solutions Being Pursued No Known Solutions
Hold These Thoughts… • ITRS is created by SIA companies and top semi/system houses worldwide – all star customers • EDA has one chapter out of 12 • EDA is just another part of SISA (semiconductor industry supplier association) • EDA is small: 6000 R&D worldwide, $4B market • Hold this thought: Dataquest 3.9% annual growth in tools $ spent per designer; integration costs > tool costs • Hold this thought: “small industry with poor perceived ROI will stay small” = vicious cycle • Hold this thought: How do we turn a vicious cycle into a virtuous cycle?
Six Riffs • Riff #1: ITRS acceleration, silicon technology, and system drivers • Riff #2: A big picture on red bricks • Riff #3: A Dark Riff on D and DT productivity • Riff #4: On the design-manufacturing handoff • Riff #5: On cost, variability and value • Riff #6: It’s lunchtime
Riff #1: ITRS Acceleration, Silicon Technology, and System Drivers
Roadmap Acceleration Since 2000 • Major accelerations continue • E.g., 90nm node is in 2004, with physical gate length at 45nm • MPU/ASIC half-pitch were separate, now unified • ASIC is at the same process node as MPU • 2-year cycles b/w MPU/ASIC generations through 2004 • Node = 0.7x multiplier of half-pitch or minimum feature size, generally allowing 2x the transistors on the same size die • “Normal” pace = 3-year cycle • MPU/ASIC half-pitch converges w/DRAM HP in 2004 • Previous ITRS (2000): convergence predicted for 2015 • Extremely aggressive scaling for density, cost improvement and competitive positioning
System Drivers • Define IC products that drive mfg, design technologies • ORTCs + SDs = “consistent framework for tech requirements” • Four system drivers • MPU – traditional processor core • SOC (focus on “ASIC-LP”, + high-pins, high-signaling network driver) • AM/S – four basic circuits and FOMs • DRAM • Each driver section • Nature, evolution, formal definition of this driver • What market forces apply to this driver ? • What technology elements (process, device, design) does this drive? • Key figures of merit, and roadmap
MPU Driver • Old MPU model – 3 flavors • New MPU model - 2 flavors • Cost-performance at production (CP) • 140 mm2 die, “desktop” • High-performance at production (HP) • 310 mm2 die, “server” • Both have multiple cores (“helper engines”), on-board L3 cache, … • Multi-cores == more dedicated, less general-purpose logic; driven by power and reuse considerations; reflect convergence of MPU and SOC • Doubling of transistor counts is each per each node, NOT per each 18 months • Clock frequencies stop doubling with each node
Example Supporting Analyses (MPU) • Diminishing returns • Pollack’s Rule: In a given process technology, new microarchitecture takes 2-3x area of previous generation one, and provides only 50% more performance • Corroboration: SPECint/MHz, SPECfp/MHz, SPECint/Watt all decreasing • Power knob running out • Speed == Power • Large switching currents, large power surges on wakeup, IR drop control issues all limited by A&P roadmap (e.g., improvement in bump pitch, package power) • Power management: 2500% improvement needed by 2016 • Speed knob running out (new clock frequency model) • Historically, 2x clock frequency every node • 1.4x/node from device scaling but running into tox, other limits (PIDS) • 1.4x/node from fewer logic stages (from 40-100 down to around 14 FO4 INV delays) • Clocks cannot be generated with period < 6-8 FO4 INV delays • Pipelining overhead (1-1.5 FO4 INV delay for pulse-mode latch, 2-3 for FF) • Around16 FO4 INV delays is limit for clock period in core (L1 $ access, 64b add) • Cannot continue 2x frequency per node trend in ITRS
SOC-LP Driver • Power gap • Must reduce dynamic and static power to avoid “zero logic content limit” • Hits low-power SOC before hits MPU • SOC degree of freedom: low-power (not high-perf) process • SOC-LP model drives ASIC-LP (PIDS) device model • Lgate lags high-performance devices by 2 years, but layout density same • Accompanying device parameter changes • Vth higher, Vdd higher • Ig, Ioff starts at 100pA/um (L(Operating)P), 1pA/um (L(STandby)P) • Tox higher • Slower devices (larger CV/I) • Even with four LP device flavors, Design still faces large static power management challenge, and must handle multi (Vt,tox,Vdd) • SOC-LP driver: low-power PDA • Composition: CPU cores, embedded cores, SRAM/eDRAM • Roadmap for IO bandwidth, processing power, GOPS/mW efficiency • Die size grows at 20% per node
SOC-LP Driver Model • Required performance trend of SOC-LP PDA driver • Drives PIDS/FEP LP device roadmap, Design power management challenges
Parameter Type 99 00 01 02 03 04 05 06 07 10 13 16 Tox (nm) MPU 3.00 2.30 2.20 2.20 2.00 1.80 1.70 1.70 1.30 1.10 1.00 0.90 LOP 3.20 3.00 2.2 2.0 1.8 1.6 1.4 1.3 1.2 1.0 0.9 0.8 LSTP 3.20 3.00 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.1 1.0 0.9 Vdd MPU 1.5 1.3 1.2 1.1 1.0 1.0 0.9 0.9 0.7 0.6 0.5 0.4 LOP XXX XXX 1.2 1.2 1.1 1.1 1.0 1.0 0.9 0.8 0.7 0.6 LSTP XXX XXX 1.2 1.2 1.2 1.2 1.2 1.2 1.1 1.0 0.9 0.9 Vth (V) MPU 0.21 0.19 0.19 0.15 0.13 0.12 0.09 0.06 0.05 0.021 0.003 0.003 LOP 0.34 0.34 0.34 0.35 0.36 0.32 0.33 0.34 0.29 0.29 0.25 0.22 LSTP 0.51 0.51 0.51 0.52 0.53 0.53 0.54 0.55 0.52 0.49 0.45 0.45 Ion (uA/um) MPU 1041 1022 926 959 967 954 924 960 1091 1250 1492 1507 LOP 636 591 600 600 600 600 600 600 700 700 800 900 LSTP 300 300 300 300 400 400 400 400 500 500 600 800 CV/I (ps) MPU 2.00 1.64 1.63 1.34 1.16 0.99 0.86 0.79 0.66 0.39 0.23 0.16 LOP 3.50 2.87 2.55 2.45 2.02 1.84 1.58 1.41 1.14 0.85 0.56 0.35 LSTP 4.21 3.46 4.61 4.41 2.96 2.68 2.51 2.32 1.81 1.43 0.91 0.57 Ioff (uA/um) MPU 0.00 0.01 0.01 0.03 0.07 0.10 0.30 0.70 1.00 3 7 10 LOP 1e-4 1e-4 1e-4 1e-4 1e-4 3e-4 3e-4 3e-4 7e-4 1e-3 3e-3 1e-2 LSTP 1e-6 1e-6 1e-6 1e-6 1e-6 1e-6 1-6 1e-6 1-6 3e-6 7e-6 1e-5 Gate L (nm) MPU 100 70 65 53 45 37 32 30 25 18 13 9 L(*)P 110 100 90 80 65 53 45 37 32 22 16 11 LP Device Roadmap
2001 2004 2007 2010 2013 2016 Total LOP Dynamic Power Gap (x) -0.06 0.59 1.03 2.04 6.43 23.34 Total LSTP DynamicPower Gap (x) -0.19 0.55 1.35 2.57 5.81 14.00 Total LOP Standby Power Gap (x) 0.85 5.25 14.55 30.18 148.76 828.71 Total LSTP Standby Power Gap (x) -0.98 -0.98 -0.97 -0.88 -0.55 0.24 Power Management Gap (x) (with utterly optimistic device assumptions...)
Big Picture • ITRS takes Moore’s Law as a constraint • Problem: ITRS signed up for the “wrong” Moore’s Law • 2x frequency, 2x xtors,bits every node power, utility contradictions • Each increment of performance is more and more costly • Compounding problems • no architecture awareness • no application awareness (e.g., low-power networked-embedded SOC) • planar CMOS-centric (no DGFET, FinFET in requirements) • uneven acknowledgment of cost (mask NRE cost, design NRE cost, cost of technology development, manufacturing cost, manufacturing test …) • New in 2001: Can Design help solve it? • PIDS : 17%/year improvement in CV/I metric punt Ioff, Rds, … • A&P : bump pitch improves slowly punt IR drop, power, signaling impacts Test as well • Interconnect, Litho, PIDS/FEP : what variability can Designers tolerate?
DT Integration With Other Technologies • Problem: Design has always been “metric-free” • Metric “red brick wall” requirement for R&D investment • EDA Goal 1: show red bricks in Design Technology • EDA Goal 2: shiftred bricks from other supporting technologies • e.g., lithography CD variability requirement solved by new Design techniques that can better handle variability • e.g., mask data volume requirement solved by Design/Mfg interfaces and flows that pass functional requirements, verification knowledge to mask writing and inspection • e.g., Simplex “X initiative” as much impact as copper ? • It’s an ROI issue !!! • Need metrics of design cost, design quality/value DT ROI • Need serious validation/participation from EDA community before we can expect help from system, ASIC companies
Dielectric Permittivity: Near Term Years Bulk and effective dielectric constants described Porous low-k requires alternative planarization solutions Cu at all nodes - conformal barriers C. Case, BOC Edwards – ITRS-2001 preliminary
100nm ITRS Requirement WITH Cu Barrier 70nm ITRS Requirement WITH Cu Barrier Effect Of Line Width On Cu Resistivity Conductor resistivity increases expected to appear around 100 nm linewidth - will impact intermediate wiring first - ~ 2006 Courtesy of SEMATECH C. Case, BOC Edwards – ITRS-2001 preliminary
Device Roadmap Changes • Process Integration, Devices and Structures (PIDS) • CV/I delay metric: historically decreases by 17%/year • Since frequency improvement from shorter pipelines no longer available, perhaps we do need to keep scaling CV/I … • Bottom line: PIDS is running up against limits of planar CMOS, and is shifting at least some of the pain to “design/architecture improvements” • Continuing CV/I trend necessitates huge growth in Ioff • Subthreshold Ioff at room temperature increases from 0.01 uA/um in 2001 to 10 uA/um at end of ITRS (22nm node) • Ioff increases by at least order of magnitude at ~100 deg C operating temps (40x difference between 25 deg C and 125 deg C) • Static power becomes a huge problem: multi-Vt, multi-Vdd, substrate biasing, constant-throughput power minimization, etc. must be coherently and simultaneously applied/optimized by automatic tools • Also necessitates aggressive reduction in tox • Physical tox thickness hovers at < 1.4nm (down to 1.0nm) starting in 2001, even assuming arrival of high-k gate dielectrics starting in 2004 • Implies huge variability mitigation challenges for Design Technology: “10%” < one monolayer…
Assembly/Packaging Roadmap • MPU pad counts flat from 2001-2005; chip current draw increases 64% • Effective bump pitch roughly constant at 350mm • Bump/pad counts scale with chip area only, do not increase with technology demands (IR drop, L*di/dt) • metal resources needed to control <10% IR drop skyrocket since Ichip and wiring resistance increase challenge for DT • Later technologies (30-40nm) also have too few bumps to carry maximum current draw (e.g., 1250 Vdd pads at 30nm with bump pitch of 250mm can each carry 150mA 187.5A max capability but Ichip/Vdd > 300A • A&P Rationale: cost control (puts pain onto Design) • Design Rationalization: must add power constraints • ITRS2001 will have strong power-constrained focus • Cost of liquid cooling, refrigeration, etc. impractical anyway (???) • 30-50 W/cm2 limit for forced-air cooling with fins • MPU power dissipation capped at 200W; MPU chip area held constant (more area can’t be used well within 150W power budget)
Design Technology and the ITRS • Cost = biggest hole in ITRS and in DT • Manufacturing cost, NRE cost (design, mask, …), technology development cost (= who should have/solve red brick walls?) • Challenges for DT (with respect to ITRS) • Circuit/layout optimizations in the face of manufacturingvariability • System cost-driven design technology • Holistic analysis, management of power (both dynamic and static) • Circuit- and methodology-levelIP: global signaling and synchronization, off-chip IO; power delivery and management • Metrics, needs roadmap for quality/cost/ROI of design and design process • Verification and test (else cost of mfg test soon exceeds cost of mfg) • Software
$10 $3 $1 The Productivity Gap Potential Design Complexity and Designer Productivity Equivalent Added Complexity Logic Tr./Chip Tr./S.M. 68 %/Yr compounded Complexity growth rate 21 %/Yr compound Productivity growth rate “How many gates can I get for $N?” 3 Yr. Design YearTechnologyChip ComplexityFrequencyStaffStaff Cost* • 250 nm 13 M Tr. 400 MHz 210 90 M • 250 nm 20 M Tr. 500 270 120 M • 180 nm 32 M Tr. 600 360 160 M • 2002 130 nm 130 M Tr. 800 800 360 M Source: SEMATECH * @ $ 150 k / Staff Yr. (In 1997 Dollars)
O(25 mask levels) ~ “$1M mask set” in 130nm Mask Cost But: average only 500 wafers per mask set !
“Keep the Fabs Full” • Design technology must keep manufacturing facilities fully utilized with: • high-volume parts • high-margin parts • Foundry capital cost > $2B • How much value of new designs is needed to fill the fab ???
Application / Behavior SW/HW Implementation Gap Design Entry Level Level of Abstraction Gate-level “platform” RTL Today Tomorrow Mask Effort/Value Design Productivity Need + DSM = 2 EDA Trends source: MARCO GSRC
Application SW/HW Design Entry Level Hand-off “platform” RTL Mask Fab Amortization Close the Implementation Gap Level of Abstraction Effort/Value source: MARCO GSRC
Design Productivity Gap Low-Value Designs? Percent of die area that must be occupied by memory to maintain SOC design productivity Source = Japanese system-LSI industry
V S V G S S S V S V S V G S S • G S V Reduce Back-End Effort ? Example: repeating dense wiring fabric pattern at minimum pitch - Eliminates signal integrity, delay uncertainty concerns - But has at least 60% - 80% density cost source: MARCO GSRC
P1 P3 P2 P4 P5 Pearls (the IP Processes) MicroShells (the IP Requirements) P6 MacroShells (the Protocol Interface) P7 Communication Channels Improve IP Reuse Productivity ? source: MARCO GSRC
QUALITY Problem : > 1000x Energy-Flexibility Gap 1000 100-200 MOPS/mW Dedicated HW 100 10-50 MOPS/mW ReconfigurableProcessor/Logic Energy Efficiency MOPS/mW (or MIPS/mW) 10 ASIPs DSPs 1 V DSP 3 MOPS/mW 1 Embedded mProcessors LP ARM 0.5-2 MIPS/mW 0.1 Flexibility (Coverage) Source: Prof. Jan Rabaey, UC Berkeley
“Keep the Fabs Full” • Design technology must keep manufacturing facilities fully utilized with: • high-volume parts • high-margin parts • What happens when design technology “fails” ? • not enough high-value designs • the semiconductor industry will find a “workaround” • reconfigurable logic • platform-based design • extract value somewhere other than silicon differentiation
Dark Riff Conclusions • Design productivity gap threatens design quality design starts, business models at risk • TAT achieved at cost of QOR • low QOR low silicon value • electronics industry chooses reprogrammable, platform-based “workarounds” • We need to understand cost and quality/value
Two CANDE-01 Non-Predictions • Jim Sproch, Synopsys: • “Summary: Rising NRE will force semiconductor manufacturers to produce primarily high-volume, general purpose components such as memory, FPGAs, and standard processors. New EDA tools will then have an impact on only a smaller fraction of the semiconductor industry, and research funding will evaporate, leaving only the service and support functions, which don’t need to be centralized. • Prediction: EDA industry is reduced to a service role as semiconductor design starts decline. • Prediction: Design for Cost EDA tools will reach the marketplace by 2006.
OPC Corrections With OPC No OPC Original Layout Optical Proximity Correction (OPC) • Corrective modifications to improve process control • improve yield (process window) • improve device performance
conventional mask phase shifting mask glass Chrome Phase shifter 0 E at mask 0 0 E at wafer 0 0 I at wafer 0 Phase Shifting Masks (PSM)
Lens Towards Lens Wafer Plane Edge: High Aberrations Center: Minimal Aberrations Field-Dependent Aberration • Field-dependent aberrations cause placement errors and distortions R. Pack, Cadence
Optical Lithography (it’s not going away…) • Process window and yield enhancement: forbidden width-spacing combinations (defocus window sensitivities), generally complex “local DRCs” • Lithography equipment choices: forbidden configurations such as wrong-way critical-width doglegs, or diagonal features • Notch rules, critical-feature rules on local metal due to OPC (subresolution assist features, especially) Numerical Technologies, Inc.
RET Roadmap 0.25 um 0.18 um 0.13 um 0.10 um 0.07 um Rule-based OPC Model-based OPC Scattering Bars AA-PSM Weak PSM Rule-based Tiling Optimization-driven MB Tiling Litho CMP Number Of Affected Layers Increases / Generation 248 nm 248/193 nm 193 nm W. Grobman, Motorola – DAC-2001
About Mask Data and $1M Mask NRE • Format proliferation • Most tools have unique data format • Raster-VSB conversion, reverse can be inefficient • Real-time manufacturing tool switch, multiple qualified tools duplicate fractures to avoid delays if tool switch required • Data volume • OPC drives figure count acceleration • MEBES format is flat • ALTA machines slow down with > 1GB data • Burden on globally distributed mfg resources • Inefficient refractures • Refractures!? • Mask industry historically never touched mask data: unwilling to take risk, not enough margin or reason • Today, 90% of mask data files manipulated / refractured: process bias sizing (iso-dense, loading effects, linearity, …), mask write optimization, multiple tool formats, …
P. Buck, Dupont Photomasks – ISMT Mask-EDA Workshop July 2001
P. Buck, Dupont Photomasks – ISMT Mask-EDA Workshop July 2001
P. Buck, Dupont Photomasks – ISMT Mask-EDA Workshop July 2001
Out-of-control mask flow P. Buck, Dupont Photomasks – ISMT Mask-EDA Workshop July 2001
DT Needs for RET and Mask NRE • WYSIWYG broken (mask) verification bottleneck • Need function- and cost-aware RET • RET insertion is for predictable circuit performance, function • RET tool must understand functional intent • make only corrections that win $$$, reduce performance variation • make only corrections that can be manufactured and verified (including mask inspection) • understand (data volume, verification) costs of breaking hierarchy • Understand flow issues • e.g., avoid making same corrections 3x (library, router, PV tool) • Handoff to manufacturing: MUCH more than GDSII • Includes sensitivities to patterning variation/error • Bidirectional pipe: functionally robust layout performed w.r.t. models of manufacturing errors and electrical implications • Mask verification driven by functional sensitivity information • Mask and ASIC folks aren’t asleep on this, either
Another CANDE-01 Non-Prediction • Prediction: GDSII, in its present form, will no longer be the handoff from design to manufacturing.
Design is Also Part of NRE Cost • Design cost model (Gary Smith/Dataquest, 2001) • engineer cost per year increases 5% per year ($181,568 in 1990) • EDA tool cost per year (per engineer) increases 3.9% per year ($99,301 in 1990) (+ separate term for interoperability) • Productivity due to 8 major Design Technology innovations (3.5 of which are still unavailable) : RTL methodology; In-house P&R; Tall-thin engineer; Small-block reuse; Large-block reuse; IC implementation suite; Intelligent testbench; ES-level methodology • Matched up against SOC-LP PDA content: • SOC-LP PDA design cost = $15M in 2001 • Would have been $342M without EDA innovations and the resulting improvements in design productivity • (Is this an effective message?)