1 / 32

Dynamically Heterogeneous Cores Through 3D Resource Pooling

Dynamically Heterogeneous Cores Through 3D Resource Pooling. Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen. Speaker: Houman Homayoun National Science Foundation CI Fellow University of California San Diego. Why Heterogeneity?.

ipo
Download Presentation

Dynamically Heterogeneous Cores Through 3D Resource Pooling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamically Heterogeneous Cores Through 3D Resource Pooling Houman Homayoun Vasileios Kontorinis Amirali Shayan Ta-Wei Lin Dean M. Tullsen Speaker: Houman Homayoun National Science Foundation CI Fellow University of California San Diego

  2. Why Heterogeneity? • Existing General Purpose CMP designs use only homogeneous cores • A general purpose one-size-fits-all core is not necessarily the most efficient • One processor optimized for each application! Core 2 Core 1

  3. Static vs. Dynamic Heterogeneity • Prior proposals (e.g., Kumar 2003) propose static heterogeneity. • Increases chance of finding an appropriate core • Does not guarantee perfect match • Others have proposed solutions for dynamic heterogeneity (Core Fusion, TFlex). • Due to the difficult of sharing resources at a fine granularity, they enable only coarse-grain sharing. • Big (combined) cores or small cores.

  4. Outline • Resource Pooling • Why 3D? • Design Solutions • Adaptive Policies • Results • Conclusion

  5. Application Resource Utilization

  6. Application Resource Utilization LDSQ ROB RF IQ

  7. Application Resource Utilization Dual-Core Machine Application 1 LDSQ ROB RF IQ Application 2 LDSQ ROB RF IQ underutilized

  8. Dynamic Heterogeneity Through Resource Pooling Register File Register File ROB ROB Core 2 Core 1 Dynamic vs. Static Heterogeneity

  9. Outline • Need for Heterogeneity • Why 3D? • Design Solutions • Adaptive Policies • Results • Conclusion

  10. Why NOT Sharing in 2D? • Long wire delay in 2D In 2D, it is not efficient 5 nsec 500 psec Demanding

  11. Our Solution: 3D

  12. Our Solution: 3D • Fast interconnection network (Through Silicon Via)TSV Minimize the Communication Latency As fast as few ps (three order of magnitude smaller than 2D) 5 psec 5000 psec • A principal advantage • No change to the fundamental pipeline design of 2D architectures, yet still exploits the 3D to provide greater energy proportionality and core customization

  13. Outline • Need for Heterogeneity • Why 3D? • Design Solutions • Adaptive Policies • Results • Conclusion

  14. Stackable Structures for Resource Pooling • Performance bottleneck and power hungry resources • Reorder Buffer and Register File (SRAM) • Instruction Queue and Load and Store Queue (CAM+SRAM) • Our goal: • share units across multiple cores with minimal impact on design spec (latency, number of ports and power) • Use previously proposed modular design • Each partition is a self-standing and independently usable unit • Effective in reducing power and access delay Part 1 Part 2 Register File Part 3 Part 4 Independent partition

  15. Example of Resource Sharing • Additional logic to decide whether partition is empty • Additional logic to route the signal to the right partition Register File in Core 1 Free Free TSV Register File in Core 0 Decoder MUX Partition

  16. Outline • Need for Heterogeneity • Why 3D? • Design Solutions • Adaptive Policies • Results • Conclusion

  17. Adaptive Policies for Resource Pooling • Several issues need to be considered • Ownership • Fast releasing • Fast reallocation • Cycle by cycle adaptation • Prevent starvation • A simple adaptive policy specification (MinMax policy) • Set limit for the size of resources • how much they can grow up to (MAX) or they can shrink down to (MIN) • Use free list • Use central arbitration

  18. MinMax Policy Example Application 3 Application 4 Application 1 Application 2 Core 1 Core 4 Core 2 Core 3 Register File MIN Free List Arbitration Unit

  19. MinMax Policy Example Application 3 Application 4 Application 1 Application 2 Core 1 Core 4 Core 2 Core 3 Register File MIN Free List Arbitration Unit

  20. MinMax Policy Example Application 3 Application 4 Application 1 Application 2 Core 1 Core 4 Core 2 Core 3 Register File MIN Free List Arbitration Unit

  21. MinMax Policy Example Application 3 Application 4 Application 1 Application 2 Core 1 Core 4 Core 2 Core 3 Register File MIN Free List Arbitration Unit

  22. Outline • Need for Heterogeneity • Why 3D? • Design Solutions • Adaptive Policies • Results • Conclusion

  23. Baseline Architecture (1) (2) • Processor Model • High-end architecture, four OoO cores with issue width of 4 • Medium-end architecture, four OoO cores with issue width of 2 • 3D Floorplans (different performance, flexibility, and temperature tradeoff) • (1) Conventional (Thermal-Optimized Design) • (2) Proposed (Performance-Optimized Design)

  24. Evaluation Power Performance Temperature Energy-Delay 4 Thread 1 Thread 2 Thread Core 1 Core 2 Active core Core 3 Idle core Core 4 Link

  25. Single Thread Performance • Single benchmark (3 out of 4 cores are idle) Speed Up Standard SPEC2K and SPEC2006 Benchmark Average 45% in Medium-end, 26% in High-end

  26. Multi-Thread Performance • 2Thr: 2 idle cores + underutilized resources in the active cores • 4Thr: No idle cores, only underutilized resources Normalized Weighted Speedup (%) gains are dramatic when some cores are idle

  27. Medium-end vs High-end • Resource pooling makes the medium core significantly more competitive with the high-end. Normalized Weighted Speedup (%) Only 3%! 28% 14% 0 Idle Core 2 Idle Core 3 Idle Core Increase Resource Sharing

  28. Power • Pooling pay a small price in power • Because of the enhanced throughput. • Large speedups on low-IPC threads and high average speedup, but smaller increase in total instruction throughput and thus smaller increase in power power (Watt) 4X 3X

  29. Temperature • Interestingly, the temperature of the medium resource-pooling core is comparable to the high-end core temperature (Celsius)

  30. Efficiency • Even still, at equal temperature, the more modest cores have a significant advantage in energy efficiency measured in MIPS2/W (MIPS2/W is the inverse of energy-delay product) Normalized 2X

  31. Conclusions • Homogeneous cores are inherently inefficient for a diverse workload. • Cores are typically overprovisioned as a result • 3D stacking of cores enables fine-grain sharing (pooling) of resources not possible in 2D designs. • Our dynamically heterogeneous 3D architecture allows the processor to construct the right core for each application dynamically, maximizing energy efficiency. • Our 3D pooling architecture • Leverages our experience in 2D pipeline design, yet still gains significant benefit from 3D • Adapts to the specific demands of an application within a few cycles. • Reduces reliance on overprovisioned cores, instead grabbing larger resources only when needed.

  32. End of presentation

More Related