1 / 18

1000 Core Processors - Myth or Reality?

Challenges in Scaling Operating Systems to 1,000 Cores and Beyond Data Intelligence Forum 2011 SAIT Tech Fair Presented by Dr. Daniel Waddington - SISA Computer Science Lab. 1000 Core Processors - Myth or Reality?. Do we expect to see this in the next decade?.

cisco
Download Presentation

1000 Core Processors - Myth or Reality?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Challenges in Scaling Operating Systemsto 1,000 Cores and BeyondData Intelligence Forum 2011 SAIT Tech FairPresented by Dr. Daniel Waddington - SISA Computer Science Lab

  2. 1000 Core Processors - Myth or Reality? • Do we expect to see this in the next decade?

  3. 1000 Core Processors - Myth or Reality? • 100 core processors are commercially viable now • TileraGX Series • 1000 core processors have been demonstrated in 2011 • University of Glasgow / U. Mass Lowell • Manycore remains a key direction for major vendors • Intel Knight’s Ferry - IDF2011 championing • AMD • Moore’s law is sustainable (for a little longer) • 3D tri-gate transistor technology brings 10nm to the 2015 realms (Intel Ivy Bridge) • moving from 32nm to 22nm gives approximately x2 higher transistor density • more transistors = more cores (ILP is flat) - likely heterogeneous • Different domains will have different driving forces • power • performance

  4. 1000 Core Processors - Myth or Reality? What “killer” apps do we expect to see for 1000 core processors...? • Vision and Image processing • 3D enhancement, object recognition, gesture recognition, augmented reality, video and media production • Audio and Speech processing • dialects and accents, translation, command interface • Information Processing and Retrieval • encryption/decryption, search, pattern matching • Sensor Data Processing and Aggregation • Vehicles and Transportation • Gaming • Biological • protein folding, personal DNA sequencing • .... Remove your System-on-Chip solutions and move to software!

  5. ...but Software is Still in Trouble • Best sample (Sudoku) only scales to ~50% at 1000 cores) • Most stop scaling at around 20 cores

  6. Amdahl’s Law • Amdahl’s law defines the relationship between potential speed up and the amount of serial (non-parallel) execution where: f is serialization factor t is time N is total number of processors factoring out t ... serial part parallel part • This law is optimistic - data sharing (e.g., accumulation), overhead (e.g., scheduling and migration) all make things worse • Best sample (Sudoku) only scales to ~50% at 1000 cores • Our goal is 80% efficiency at 1000 cores for our “killer” applications THE SERIALIZATION FACTOR IS STATIC W.R.T. N

  7. Enhancing Amdahl’s Equations • Practical implementation of a parallel program requires access to shared resources (e.g., accumulation, notification) that are protected by locks (critical sections) where: fs is static serialization factor fdis dynamic serialization factor t is time N is total number of processors dynamicserial part static serial part parallel part Example: Effect on scaling by accumulation on a critical section where fcrit = 1/1,000,000 (disclaimer: pipelining can help this problem)

  8. Critical Sections in the Operating System • Synchronization on (global) critical sections significantly affects scalability • Critical sections exist in both the application and the Operating System • Even if you get the application right - you have to get the OS right also! Graphs showing data from measurements of kernel lock activity with respect to scaling the number of cores. Data from Linux 2.6.32 kernel locking (lockstat)

  9. Eliminating Global Synchronization • Critical sections on global shared resources are clearly bad • Hierarchical arrangements can be used to break global dependencies Example: where A = 8 where: fs is static serialization factor fcritis fraction of time in critical sections A is number of accumulation steps t is time N is total number of processors • Using this pattern to accumulate across 1024 cores 1024→256→64→16→4→1 A = 4*5 = 20 20 where A = 8

  10. SISA Omni-OS Project • New project (2011-) in collaboration with SAIT Future IT, Intelligent Computing Lab to develop broader “Future OS” technology • As part of this effort Omni-OS is focusing on scalability to 1000 cores • Our goal is to achieve 80% efficiency at 1000 cores by 2015

  11. Omni-OS Approach

  12. Underlying Re-design • Omni-OS is based on the Fiasco.OC L4 secure microkernel (TU Dresden) • As part of the project we are developing a new (user-level) personality that is explicitly designed to eliminate all points of global synchronization • Our concept is to use delegation to distribute resource management (including admission control) and OS services across cores

  13. Conclusions • We believe 1000 cores will come... ..probably heterogeneous (GPU-CPU convergence) ..probably starting in the data center domain ..probably commercialized in 5-10 years • Building software for processors that support this order-of-magnitude in cores requires elimination of all global synchronization points • we expect hardware to follow similar trends (local cache coherency and shared memories) • Monolithic OS designs cannot survive without significant re-architecting • move to microkernel designs to help separation and distribution (e.g., GNU Hurd) • they will transform eventually • Scaling software to 1000+ cores requires a complete solution throughout the software stack - application, compiler/runtime and operating system

  14. Questions?

  15. Appendix

  16. Transistor Count Continues to Increase

  17. Parallel Core Scaling Continuing shift to multicore and manycore

  18. Broadening our Business Health Transportation Vehicle • Connected Healthcare • Robotic Surgery • Smart Car • Near-zero Traffic Congestion • Smart Highway Building Robotics Power • Smart Building • Human Augmentation • Home Robot • Smart Grid Defense Agriculture • Defense System • Smart Gardening

More Related