90 likes | 190 Views
Resource Aware Scheduler – Initial Results. Tomer Morad, Noam Shalev , Avinoam Kolodny , Idit Keidar , Uri Weiser May 8, 2013. Main Message: Balance Systems to Avoid Bottlenecks. Motivation
E N D
Resource Aware Scheduler – Initial Results Tomer Morad, Noam Shalev, AvinoamKolodny, IditKeidar, Uri Weiser May 8, 2013
Main Message: Balance Systems to Avoid Bottlenecks • Motivation • Different programs have different resource requirements: # of cores, cache, memory bandwidth, energy, branch prediction, etc. • Hence, no computing system can be balanced • Heterogeneous systems are even worse (unbalanced) • Contention on resources wastes energy and usually degrades performance (for example: cache) • Proposal: dynamically tune the workload to the (dynamically tuned) hardware in order to minimize the contention on the resources by balancing the system • The OS scheduler can do this
CMP Shared Resource Effects • Examples for shared resources: last level cache, memory bus, network bandwidth, disk bandwidth, etc. • There are three effects observed when several threads access a shared resource • Wasted Peripheral Energy (⬆Energy) • Observed when adding additional threads in a presence of a bottleneck • For example: many floating point programs running in parallel in a Niagara processor (many cores with a shared floating point unit) • Collisions (⬆ Energy, ⬇ Throughput) • Observed when several threads access a shared resources, and the requests are queued • In the example above, the service to the requests is slower • Destructive Interference (⬆ Energy, ⬇ Throughput) • Observed when threads destroy each others’ caches
Resource Aware OS scheduler • Main Components: • Sampling: Sample the resource usage of the tasks that have run so that the information will be available for the prediction stage • Prediction: Predict each task’s resource usage based on the past resource usage • Scheduling: Schedule only tasks that the system has enough resources to run (idle cores are OK) • Implemented in Linux 3.2.0 • Use performance counters for sampling
Memory Bandwidth – An Example • Core count is increasing • Core frequency does not decrease • Pin count is not increasing • Chip bandwidth demand is increasing, but • Chip bandwidth to memory is not increasing • We are approaching the memory bandwidth wall! • No real remedies in the near future
SPEC-CPU2006 on the baseline scheduler Instance Instances Instances Instances
BW hungry program – Initial results • Implemented a resource aware scheduler in the Linux 3.2.0 • BW hungry program • 5.58 sec, 132 Joules • When run x4 times sequentially • 22.3 sec, 526 Joules • When run x4 times in parallel (4 core i5-2500) • 27.86 sec (+25%), 1368 Joules (+160%) – over sequential • Using the new scheduler with memory bandwidth limitation enforcement • 23.71 sec (+6%), 569 Joules (+8%) – over sequential • Baseline scheduler Vs Resource Aware Scheduler • 17.5% speedup, 58% energy reduction • Disclaimers: (a) Initial results; (b) energy sampled using performance counter (MSR_PKG_ENERGY_STATUS) that samples the power used by the package. Consistent results with Wattsup
SPEC-CPU2006 – Initial results • Each run included four instances of identical SPEC-CPU2006 benchmarks • Average: +3.3% throughput, -3.5% energy • Notable results: • 429: +106% throughput, -43% energy • 473: +3.3% throughput, -13% energy • Out of 25 benchmarks • 16 consumed less energy (9 consumed more) • 10 ran faster (11 slower) • Other results • Energy efficiency improved on average by 11% • 15 benchmarks’ energy efficiency improved by 20% on average • 10 benchmarks’ energy efficiency degraded by 3% on average • Soft limit for the bandwidth anticipated to improve the results