180 likes | 317 Views
Challenges in Scaling Operating Systems to 1,000 Cores and Beyond Data Intelligence Forum 2011 SAIT Tech Fair Presented by Dr. Daniel Waddington - SISA Computer Science Lab. 1000 Core Processors - Myth or Reality?. Do we expect to see this in the next decade?.
E N D
Challenges in Scaling Operating Systemsto 1,000 Cores and BeyondData Intelligence Forum 2011 SAIT Tech FairPresented by Dr. Daniel Waddington - SISA Computer Science Lab
1000 Core Processors - Myth or Reality? • Do we expect to see this in the next decade?
1000 Core Processors - Myth or Reality? • 100 core processors are commercially viable now • TileraGX Series • 1000 core processors have been demonstrated in 2011 • University of Glasgow / U. Mass Lowell • Manycore remains a key direction for major vendors • Intel Knight’s Ferry - IDF2011 championing • AMD • Moore’s law is sustainable (for a little longer) • 3D tri-gate transistor technology brings 10nm to the 2015 realms (Intel Ivy Bridge) • moving from 32nm to 22nm gives approximately x2 higher transistor density • more transistors = more cores (ILP is flat) - likely heterogeneous • Different domains will have different driving forces • power • performance
1000 Core Processors - Myth or Reality? What “killer” apps do we expect to see for 1000 core processors...? • Vision and Image processing • 3D enhancement, object recognition, gesture recognition, augmented reality, video and media production • Audio and Speech processing • dialects and accents, translation, command interface • Information Processing and Retrieval • encryption/decryption, search, pattern matching • Sensor Data Processing and Aggregation • Vehicles and Transportation • Gaming • Biological • protein folding, personal DNA sequencing • .... Remove your System-on-Chip solutions and move to software!
...but Software is Still in Trouble • Best sample (Sudoku) only scales to ~50% at 1000 cores) • Most stop scaling at around 20 cores
Amdahl’s Law • Amdahl’s law defines the relationship between potential speed up and the amount of serial (non-parallel) execution where: f is serialization factor t is time N is total number of processors factoring out t ... serial part parallel part • This law is optimistic - data sharing (e.g., accumulation), overhead (e.g., scheduling and migration) all make things worse • Best sample (Sudoku) only scales to ~50% at 1000 cores • Our goal is 80% efficiency at 1000 cores for our “killer” applications THE SERIALIZATION FACTOR IS STATIC W.R.T. N
Enhancing Amdahl’s Equations • Practical implementation of a parallel program requires access to shared resources (e.g., accumulation, notification) that are protected by locks (critical sections) where: fs is static serialization factor fdis dynamic serialization factor t is time N is total number of processors dynamicserial part static serial part parallel part Example: Effect on scaling by accumulation on a critical section where fcrit = 1/1,000,000 (disclaimer: pipelining can help this problem)
Critical Sections in the Operating System • Synchronization on (global) critical sections significantly affects scalability • Critical sections exist in both the application and the Operating System • Even if you get the application right - you have to get the OS right also! Graphs showing data from measurements of kernel lock activity with respect to scaling the number of cores. Data from Linux 2.6.32 kernel locking (lockstat)
Eliminating Global Synchronization • Critical sections on global shared resources are clearly bad • Hierarchical arrangements can be used to break global dependencies Example: where A = 8 where: fs is static serialization factor fcritis fraction of time in critical sections A is number of accumulation steps t is time N is total number of processors • Using this pattern to accumulate across 1024 cores 1024→256→64→16→4→1 A = 4*5 = 20 20 where A = 8
SISA Omni-OS Project • New project (2011-) in collaboration with SAIT Future IT, Intelligent Computing Lab to develop broader “Future OS” technology • As part of this effort Omni-OS is focusing on scalability to 1000 cores • Our goal is to achieve 80% efficiency at 1000 cores by 2015
Underlying Re-design • Omni-OS is based on the Fiasco.OC L4 secure microkernel (TU Dresden) • As part of the project we are developing a new (user-level) personality that is explicitly designed to eliminate all points of global synchronization • Our concept is to use delegation to distribute resource management (including admission control) and OS services across cores
Conclusions • We believe 1000 cores will come... ..probably heterogeneous (GPU-CPU convergence) ..probably starting in the data center domain ..probably commercialized in 5-10 years • Building software for processors that support this order-of-magnitude in cores requires elimination of all global synchronization points • we expect hardware to follow similar trends (local cache coherency and shared memories) • Monolithic OS designs cannot survive without significant re-architecting • move to microkernel designs to help separation and distribution (e.g., GNU Hurd) • they will transform eventually • Scaling software to 1000+ cores requires a complete solution throughout the software stack - application, compiler/runtime and operating system
Parallel Core Scaling Continuing shift to multicore and manycore
Broadening our Business Health Transportation Vehicle • Connected Healthcare • Robotic Surgery • Smart Car • Near-zero Traffic Congestion • Smart Highway Building Robotics Power • Smart Building • Human Augmentation • Home Robot • Smart Grid Defense Agriculture • Defense System • Smart Gardening