280 likes | 443 Views
Scalability-Based Manycore Partitioning. PACT 2012. Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University. Teruo Tanimoto The University of Tokyo Hiroshi Nakamura The University of Tokyo. Presented by Kim, Jong- yul 2013. 7. 31. Contents. Motivation SBMP Scheduler
E N D
Scalability-BasedManycore Partitioning PACT 2012 Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University TeruoTanimoto The University of Tokyo Hiroshi Nakamura The University of Tokyo Presented by Kim, Jong-yul 2013. 7. 31
Contents • Motivation • SBMP Scheduler • Scalability Prediction • Core Partition • Core Donation • Phase Change Detection • Evaluation Results • Conclusions
Prospects APP2 APP3 • Limitation of increasing F • ILP, power wall, transistor scaling • Multi-core, many-core system System APP1 … Multi-threaded multiprogramming
Problem • Traditional OS Assign equal CPU to all running apps • Programs have different Scalability Linux: 2.04 Best Partitioning: 1.38 Performance Average Workloads Workloads Average Clock cycles when multiprogrammed with others NormalizedTurnaroundTime Clock cycles when solo-run
Experimental System allocation unit
SBMP Scheduler Scalability Prediction Core Partitioning Core Donation Phase Change Detection
Overview • Assign cores considering scalability of applications • SBMP: Scalability-Based ManycorePartitioning scheduler Detect Scalability Prediction Core Partitioning Core Donation Steady Partitioning
Detect Scalability Prediction Core Partitioning Core Donation Steady
Scalability Prediction (1/2) • Cumulative retired instructions per second (IPS) Little effect from # of cores Total # of instructions Workloads 8% Total # of instructions
Scalability Prediction (2/2) • If obtained directly… • Warm up branch prediction & cache system • Need 8 allocations (6, 12, 18, …, 48) • Simple model • 3 coefficients (α,β, γ) • 3 Samplings: 1 single core + 2 different configurations Over 3 seconds Performance Amdahl’s law Overhead caused by additional core
Detect Scalability Prediction Core Partitioning Core Donation Steady
Core Partitioning (1/2) High Relative performance Medium # of cores Relative performance Low # of cores
Core Partitioning (2/2) • Scalability-tablefor each program • Key -value • Key : # of cores • Value : performance with [key] cores • Goal • Hill climbing algorithm Near optimal assignment Multiprogrammed Single-run
Detect Scalability Prediction Core Partitioning Core Donation Steady
Core Donation • 1 program for each processor die • CPU utilization CPU utilization ratio < Threshold (70%) Donor Core1 Program1 Donee Core2 Program2 Program2 time • Donee: most beneficial one • Utilization, scalability • Priority: Donee < Donor • Finer granularity • Processor die (6 cores)
Detect Scalability Prediction Core Partitioning Core Donation Steady
Detect Scalability Prediction Core Partitioning Core Donation Steady
Detection (1/2) • Creation or termination of program • Phase transition detected in any of the programs Performance
Detection (2/2) – Phase Prediction • SBMP scheduler monitors performanceevery epoch (2.5s) • Threshold ( > or < Scalability Prediction Detect Core Partitioning Core Donation Steady
Evaluation Core Partitioning Phase Prediction Core Donation Overall Performance
Experimental System • PARSEC benchmark suite 2.1
Core Partitioning • SBMP-base • Scalability Prediction + Core Partitioning • Single-phase application (2 Medium + 2 Low) Linux: 1.88 SBMP-base: 1.54 Performance Average Workloads Workloads
Phase Prediction • SBMP-PP (Phase Prediction) • SBMP-base + Phase Prediction • Multiple-phase application Linux: 1.89 SBMP-base: 2.09 SBMP-PP: 1.77 Workloads
Core Donation • SBMP-CD (Core Donation) • SBMP-PP + Core Donation • 2 low CPU utilization + 2 normal Linux: 2.06 SBMP-PP: 1.68 SBMP-CD: 1.60 Workloads
Overall Results Linux: 1.83 SBMP-base: 1.99 SBMP-PP: 1.70 (8%) SBMP-CD: 1.65 (11%) • All programs 72 Workloads
Conclusions • OS scheduling on many core system • Multiple Multi-threaded applications • SBMP Scheduler • Dynamic scalability prediction + Core partitioning • Phase recognition • Core Donation • 11% over Linux
Hill Climbing Algorithm • Find near optimal solution • Start with arbitrary solution • Incrementally changing a single element