340 likes | 504 Views
3D Interconnect: Architectural Challenges and Opportunities. Tim Sherwood. UC SANTA BARBARA. The Role of Architecture. Demands. SW. HW. Constraints. (Battery Life, Performance, Programmability ). Applications. Runtime System. Architecture. 3D Integration. Circuit. Device. Package.
E N D
3D Interconnect: ArchitecturalChallenges and Opportunities Tim Sherwood UC SANTA BARBARA
The Role of Architecture Demands SW HW Constraints (Battery Life, Performance, Programmability ) Applications Runtime System Architecture 3D Integration Circuit Device Package (Noise, Thermal, Yield)
Lab Overview b0 1 0 b2 b1 0 1 0 1 b4 { 2 } 0 0 Adaptive Hardware ProfilingEngines integrated On-Chip Intrusion Detectionand Prevention 1 Software Defined WirelessAccess Point b9 High SpeedProgrammable Routers 1 1 Prototype Acceleration Primitives b5 Processor Core 1 b3 Intrusion Detection System High ThroughputMEMS controllers 0 0 1 Caches, etc. 1 b6 { 2,5 } 0 1 b8 { 2,7 } 0 Server Farm 0 b7 ReconfigurableSecurity on FPGAs Memory Hierarchy
Lab Overview b0 1 0 b2 b1 0 1 0 1 b4 { 2 } 0 0 Adaptive Hardware ProfilingEngines integrated On-Chip Intrusion Detectionand Prevention 1 Software Defined WirelessAccess Point b9 High SpeedProgrammable Routers 1 1 Prototype Acceleration Primitives b5 Processor Core 1 b3 Intrusion Detection System High ThroughputMEMS controllers 0 0 1 Caches, etc. 1 b6 { 2,5 } 0 1 b8 { 2,7 } 0 Server Farm 0 b7 ReconfigurableSecurity on FPGAs Memory Hierarchy
Potential for Impact from 3D b0 1 0 b2 b1 0 1 0 1 b4 { 2 } 0 0 Adaptive Hardware ProfilingEngines integrated On-Chip Intrusion Detectionand Prevention 1 Software Defined WirelessAccess Point b9 High SpeedProgrammable Routers 1 1 Prototype Acceleration Primitives b5 Processor Core 1 b3 Intrusion Detection System High ThroughputMEMS controllers 0 0 1 Caches, etc. 1 b6 { 2,5 } 0 1 b8 { 2,7 } 0 Server Farm 0 b7 ReconfigurableSecurity on FPGAs Memory Hierarchy 3D Integrationfor Mixed Signal 3D Bandwidth 3D Specialization 3D Integrationfor Mixed Technology 3D Bandwidth 3D Specialization 3D Integrationfor Latency
Potential for Impact from 3D 3D Bandwidth 3D Specialization 3D Bandwidth b0 1 0 b2 b1 0 1 0 1 b4 { 2 } 0 0 Adaptive Hardware ProfilingEngines integrated On-Chip Intrusion Detectionand Prevention 1 Software Defined WirelessAccess Point b9 High SpeedProgrammable Routers 1 1 Prototype Acceleration Primitives b5 Processor Core 1 b3 Intrusion Detection System High ThroughputMEMS controllers 0 0 1 Caches, etc. 1 b6 { 2,5 } 0 1 b8 { 2,7 } 0 Server Farm 0 b7 3D Integrationfor Latency ReconfigurableSecurity on FPGAs Memory Hierarchy 3D Integrationfor Mixed Signal 3D Integrationfor Mixed Technology 3D Specialization
Presented Works • Shashidhar Mysore, Banit Agrawal, Sheng-Chih Lin, Navin Srivastava, Kaustav Banerjee and Timothy Sherwood. Introspective 3D Chips , Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006. San Jose, CA • Gian Luca Loi, Banit Agrawal, Navin Srivastava, Sheng-Chih Lin, Timothy Sherwood, Kaustav Banerjee. A Thermally-Aware Performance Analysis of Vertically Integrated (3-D) Processor-Memory Hierarchy, Proceedings of the 43nd Design Automation Conference (DAC), June 2006. San Francisco, CA
Two Specific Opportunities 1)3D Integration for Performance • Bring Memory Closer to those that use it • More Bandwidth and Lower Latency • Tricky System Level Tradeoffs 2 ) 3D Integration for Specialization • Integration offers unique specialization opportunity • Decouple commodity from niche The ramifications of any radical change requires a careful evaluation that considers all the parameters
A Simple Performance “Ecosystem” package temp total power dynamic power leakage V communication utilized area freq parallelism feedback OS or runtime No multicore, no spatial variance, no temporal variance, no metrics of cost or error or yield app performance
Two Specific Opportunities 1)3D Integration for Performance • Bring Memory Closer to those that use it • More Bandwidth and Lower Latency • Tricky System Level Tradeoffs 2 ) 3D Integration for Specialization • Integration offers unique specialization opportunity • Decouple commodity from niche The ramifications of any radical change requires a careful evaluation that considers all the parameters
Basic Savings in 3D Area: 4 Dist: √8 ≈ 2.8 Area: 2 Dist: √4 ≈ 2 + 1L Area: 1 Dist: √2 ≈ 1.4 + 3L BW: √8 ≈ 2.8 BW: 4√2 ≈ 5.6 BW: 2√4 ≈ 4 On-chip Latency improved, Bandwidth could improve more What about real wires? What about apps? What about temp?
Example Technology Node Banerjee et al. IEEE 2001
3D Wire Delay - 11 x 10 Vertical via model Distributed RC delay 1 . 4 1 . 2 1 Vertical wire length ) c e S 0 . 8 ( y Ho rizontal line a l e model D 0 . 6 0 . 4 0 . 2 Ho rizontal wire length L 0 160 240 320 400 480 560 640 720 800 Wire length L ( um )
A “Typical” 2D System Design Memory Bottleneck DRAM DRAM CPU core DRAM L 2 to Main Memory External Bus Memory Controller DRAM L1 I-Cache L1 D-Cache DRAM DRAM L 2 Unified Cache DRAM Board
A 3D Memory System 8 bytes to 128 bytes 200 Mhz to 2 Ghz Layer 2 L 2 to Main Memory vertical L 2 Unified interlayer Bus L 1 to L 2 vertical Cache interlayer Bus Layer 3 to 18 L 1 I-Cache L 1 D-Cache Stacked three dimensional CPU core main memory Layer 1
System-Level Simulation % main memory access per instruction mcf 1 . 7 % parser 0 . 258472 % twolf 0 . 00062 % Simulator : Sim-Alpha simulator Processor : Alpha-21264 processor Benchmarks: mcf, parser, twolf with Minnespec reduced inputs
Effect of Bus Width and Frequency mcf 7 8 bytes bus width (2-D) 8 bytes bus width (3-D) 6 16 bytes bus width (3-D) 32 bytes bus width (3-D) 5 64 bytes bus width (3-D) 128 bytes bus width (3-D) 4 Execution time (sec) Only a few vias required 3 2 1 0 10 100 1000 10000 L2 cache size in KBytes
Self-consistent Thermal Modeling Based on the previous thermal profile calculate the new power dissipation considering Ion decrease with temperature ILeakage increase with temperature Insert the initials values of leakage and dynamic power for each layer Calculate the first thermal profile No Yes Is it convergent? Calculate the new temperature profile Finish
3D Thermally-awarePerformance Analysis mcf 3 400 n o i 390 t c u 2 . 5 r Min execution time in 2 - D t 380 s n ) i Temperature constraint K r ( e 370 e 2 p r 3 - D max chip u e t temperature m a 360 i r t e n p o 1 . 5 m i 350 t e u T 2 - D max chip c e temperature x 340 E 1 330 Min execution time in 3 - D
3D Thermally-awarePerformance Analysis twolf Maximum frequency allowed due to 1 . 1 temperature constraint 390 n o i t c 1 u 380 r t ) s K 0 . 9 Temperature constraint n ( i 370 e r r e u 0 . 8 p t a e 360 r e m 3 - D max chip 0 . 7 p i t temperature m n 350 2 - D max chip e o 0 . 6 T i temperature t u c 340 0 . 5 e x E Min execution time in 3 - D 330 0 . 4 Min execution time in 2 - D 0 . 3 600 1000 1400 1800 2200 2600 3000 Frequency in MHz
3D Memory Integration • Many Unaccounted For Effects • Effect of Multiple Cores and Memory Banks • Spatial Variation • Temporal Variation (thermal load balancing) • All of these are intimately tied to the integration methodand packaging • How to Manage • Architecture and Software will be increasingly involved • Exposing Variation to higher levels • Huge demand for “models”, “sensors”, and “knobs” • Thermal, Packaging, Application, Architecture all tangled • Need to build models that capture all of these aspects • Models need to be “self consistent”
Two Specific Opportunities 1)3D Integration for Performance • Bring Memory Closer to those that use it • More Bandwidth and Lower Latency • Tricky System Level Tradeoffs 2 ) 3D Integration for Specialization • Integration offers unique specialization opportunity • Decouple commodity from niche The ramifications of any radical change requires a careful evaluation that considers all the parameters
3D Integration for Introspection • Complex interactions across levels of abstraction make debugging, optimizing, securing, and analysis in general difficult • The first requirement – visibility • Not just data capture, we need the ability to put togethera cohesive picture of system interactions and correlate between them in a sound and non-intrusive manner • The hardware/software boundary is uniquely situated • Piece together from low level events • What would the programmer wish list look like?
What programmers want L1_BPU Decode L2_BPU Bus Control Trace CacheTop MOB ITLB Trace Cache Bottom DTLB L1CacheTop L2 Cache UROM FP Exec 2 L1 CacheBottom 320 3 2 FP Reg To Integrated Monitoring Hardware Int Exec MemCtl Alloc Retire 790 Int Reg Rename InstrQ1 Sched InstrQ2 32 bit Memory Address 32bit Memory Value 10 bit Opcodes 2, 5 bit Register Names 2, 32 bit Register Values 10 bits of “status” Everything. 4x 4x 4x 4x 4x 4x 3x 3x 3x 3x 3x 3x 1892bits per cycle = 1terrabyte/sec@ 4Ghz
Why programmers cant have it L1_BPU Decode L2_BPU Bus Control Trace CacheTop MOB ITLB Trace Cache Bottom DTLB L1CacheTop L2 Cache UROM FP Exec 2 L1 CacheBottom 320 3 2 FP Reg To Integrated Monitoring Hardware Int Exec MemCtl Alloc Retire 790 Int Reg Rename InstrQ1 Sched InstrQ2 • Interconnect is not free • Huge cross chip busses • OptBuf 285um • 20,000 buffers • Analysis is not free • Significant processing required • Extra cost of added heat • $15 budget for cooling • Used by developers
Cake + Eating It Too • Need a way to provide cheap (or high margin) HW to the masses • No paying for developer functionality • Get developers the powerful analysis they crave • See everything at executable rate • Provide “snap-on” functionality for developers • Separate chip for analysis engine • Only hook it onto “developer” systems • Idea is not limited to development systems • Security, Error Correction, Confidentiality, Accelerators, … • 3d Integration offers the potential
Conclusion: Opportunities+Challenges 3D Integration for Performance • Bring Memory Closer to those that use it • More Bandwidth and Lower Latency • Requires few vias for big impact • Tricky System Level Tradeoffs 3D Integration for Specialization • Integration offers unique specialization opportunity • Requires rethinking of integration process • Decouple commodity from niche Challenges • Cross layer models: from app to package • Cross layer optimization: both static and dynamic • Thermal Management is everybody's problem
http://www.cs.ucsb.edu/~arch/ NSF CNS 0524771, NSF CCF 0702798, NSF CCF 0448654
Related Work • Bryan Black, Murali M. Annavaram, Edward Brekelbaum, John DeVale, Gabriel H. Loh, Lei Jiang, Don McCauley, Pat Morrow, Don Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Paul Shen, Clair Webb, "Die Stacking (3D) Microarchitecture," in IEEE International Symposium on Microarchitecture, 469-479, 2006. • PUBLICATIONS on 3D STACKED IC • 1. Karthik Balakrishnan, Vidit Nanda, Siddharth Easwar, and Sung Kyu Lim, "Wire Congestion And Thermal Aware 3D Global Placement," IEEE/ACM Asia South Pacific Design Automation Conference, p1131-1134, 2005. (pdf) • 2. Jacob Minz, Sung Kyu Lim, and Cheng-Kok Koh, "3D Module Placement for Congestion and Power Noise Reduction," ACM Great Lake Symposium on VLSI, p458-461, 2005. (pdf) • 3. Jacob Minz, Eric Wong, and Sung Kyu Lim, "Reliability-aware Floorplanning for 3D Circuits," to appear in IEEE International SOC Conference, 2005. (pdf) • 4. Kiran Puttaswamy and Gabriel H. Loh, "Implementing Caches in a 3D Technology for High Performance Processors", IEEE International Conference on Computer Design, pp. 525-532, 2005. (pdf) • 5. Eric Wong and Sung Kyu Lim, "3D Floorplanning with Thermal Vias," to appear in Design, Automation and Test in Europe, 2006. • 6. Kiran Puttaswamy and Gabriel H. Loh, "Implementing Register Files for High-Performance Microprocessors in a Die-Stacked (3D) Technology," IEEE International Symposium on VLSI, pp. 384-389, 2006. (pdf) • 7. Kiran Puttaswamy and Gabriel H. Loh, "The Impact of 3-Dimenstional Integration on the Design of Arithmetic Units," IEEE International Symposium on Circuits and Systems, pp. 4951-4954, 2006. (pdf) • 8. Kiran Puttaswamy and Gabriel H. Loh, "Thermal Analysis of a 3D Die-Stacked High-Performance Microprocessor," ACM/IEEE Great Lakes Symposium on VLSI, 19-24, 2006. (pdf) • 9. Kiran Puttaswamy and Gabriel H. Loh, "Dynamic Instruction Schedulers in a 3-Dimensional Integration Technology," ACM/IEEE Great Lakes Symposium on VLSI, 153-158, 2006. (pdf) • 10. Yuan Xie, Gabriel H. Loh, Bryan Black and Kerry Bernstein, "Design Space Exploration for 3D Architectures," ACM Journal on Emerging Technologies in Computing Systems, vol.2(2), pp. 65-103, 2006. (pdf) • 11. Eric Wong, Jacob Minz, and Sung Kyu Lim, "Decoupling Capacitor Planning and Sizing for Noise and Leakage Reduction," to appear in IEEE International Conference on Computer Aided Design, 2006. • 12. Bryan Black, Murali M. Annavaram, Edward Brekelbaum, John DeVale, Gabriel H. Loh, Lei Jiang, Don McCauley, Pat Morrow, Don Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Paul Shen, Clair Webb, "Die Stacking (3D) Microarchitecture," in IEEE International Symposium on Microarchitecture, 469-479, 2006. • 13. Kiran Puttaswamy, Gabriel H. Loh, "Thermal Herding: Microarchitecture Techniques for Controlling HotSpots in High-Performance 3D-Integrated Processors," in IEEE International Symposium on High-Performance Computer Architecture, 2007. • 14. Kiran Puttaswamy, Gabriel H. Loh, "Scalability of 3D-Integrated Arithmetic Units in High-Performance Microprocessors," to appear in ACM Design Automation Conference, 2007. • PUBLICATIONS on MICRO-ARCHITECTURAL FLOORPLANNING • 1. Mongkol Ekpanyapong, Jacob Minz, Thaisiri Watewai, Hsien-Hsin S. Lee, and Sung Kyu Lim, "Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 25, No. 7, pp. 1289-1300, 2006. (pdf) • 2. Mongkol Ekpanyapong, Jacob Minz, Thaisiri Watewai, Hsien-Hsin S. Lee, and Sung Kyu Lim, "Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design," ACM Design Automation Conference, p634-639, 2004. (pdf) • 3. Mongkol Ekpanyapong, Sung Kyu Lim, Chinnakrishnan Ballapuram, and Hsien-Hsin S. Lee, "Wire-driven Microarchitectural Design Space Exploration," IEEE International Symposium on Circuits and Systems, p1867-1870, 2005. (pdf) • 4. Michael Healy, Mario Vittes, Mongkol Ekpanyapong, Chinnakrishnan Ballapuram, Sung Kyu Lim, Hsien-Hsin S. Lee, and Gabriel H. Loh, "Microarchitectural Floorplanning Under Performance and Temperature Tradeoff," to appear in Design, Automation and Test in Europe, 2006. • 5. Michael Healy, Mario Vittes, Mongkol Ekpanyapong, Chinnakrishnan Ballapuram, Sung Kyu Lim, Hsien-Hsin S. Lee, and Gabriel H. Loh, "Multi-Objective Microarchitectural Floorplanning For 2D And 3D ICs," to appear in IEEE Transactions on Computer-Aided Design of Integrated Ciruits and Systems. • 6. Fayez Mohamood, Michael Healy, Sung Kyu Lim, and Hsien-Hsin S. Lee, "A Floorplan-Aware Dynamic Inductive Noise Controller for Reliable Processor Design," to appear in IEEE/ACM International Symposium on Microarchitecture, 2006. • 7. Fayez Mohamood, Michael Healy, Hsien-Hsin Lee, and Sung Kyu Lim, "Noise-Direct: A Technique for Power Supply Noise Aware Floorplanning Using Microarchitecture Profiling," to appear in IEEE/ACM Asia South Pacific Design Automation Conference, 2007. • PUBLICATIONS on 3D PACKAGING • 1. Jacob Minz and Sung Kyu Lim, "Layer Assignment for System-on-Packages," ACM/IEEE Asia and South Pacific Design Automation Conference, p31-37, 2004. (pdf) • 2. Jacob Minz, Mohit Pathak, and Sung Kyu Lim, "Net and Pin Distribution for 3D Package Global Routing," Design, Automation and Test in Europe, p1410-1411, 2004. (pdf) • 3. Ramprasad Ravichandran, Jacob Minz, Mohit Pathak, Siddharth Easwar, and Sung Kyu Lim, "Physical Layout Automation for System-On-Packages," IEEE Electronic Components and Technology Conference, p41-48, 2004. (pdf) • 4. Pun Hang Shiu, Ramprasad Ravichandran, Siddharth Easwar, and Sung Kyu Lim, "Multi-layer Floorplanning for Reliable System-on-Package," IEEE International Symposium on Circuits and Systems, p69-72, 2004. (pdf) • 5. Jacob Minz, Sung Kyu Lim, Jinwoo Choi, and Madhavan Swaminathan, "Module Placement for Power Supply Noise and Wire Congestion Avoidance in 3D Packaging," IEEE Electrical Performance of Electronic Packaging, p123-126, 2004. (pdf) • 6. Jacob Minz and Sung Kyu Lim, "A Global Router for System-on-Package Targeting Layer and Crosstalk Minimization," IEEE Electrical Performance of Electronic Packaging, p99-102, 2004. (pdf) • 7. Jacob Minz, Eric Wong, and Sung Kyu Lim, "Thermal and Crosstalk-Aware Physical Design For 3D System-On-Package," IEEE Electronic Components and Technology Conference, P824-831, 2005. (pdf) • 8. Eric Wong, Jacob Minz, and Sung Kyu Lim, "Power Noise-aware 3D Floorplanning for System-On-Package," to appear in IEEE Electrical Performance of Electronic Packaging, 2005. (pdf) • 9. Sung Kyu Lim, "Physical Design for 3D System-On-Package: Challenges and Opportunities," IEEE Design & Test of Computers, Vol. 22, No. 6, p532-539, 2005. (pdf) • 10. Jacob Minz, Eric Wong, Mohit Pathak, and Sung Kyu Lim, "Placement and Routing for 3D System-On-Package Designs," to appear in IEEE Transactions on Components and Packaging Technologies. • 11. Jacob Minz and Sung Kyu Lim, "Block-level 3D Global Routing With an Application to 3D Packaging," to appear in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. • 12. Jacob Minz, Somaskanda Thyagaraja, and Sung Kyu Lim, "Optical Routing for 3D System-On-Package," to appear in Design, Automation and Test in Europe, 2006. • 13. Eric Wong, Jacob Minz, and Sung Kyu Lim, "White Space Management for Thermal Via and Decoupling Capacitor Insertion Targeting 3D System-On-Package," to appear in IEEE Electronic Components and Technology Conference, 2006. • 14. Eric Wong, Jacob Minz, and Sung Kyu Lim, "Multi-objective Module Placement For 3D System-On-Package," IEEE Transactions on Very Large Scale Integration Systems, Vol. 14, No. 5, pp. 553-557, 2006