1 / 9

Resource Aware Scheduler – Initial Results

Resource Aware Scheduler – Initial Results. Tomer Morad, Noam Shalev , Avinoam Kolodny , Idit Keidar , Uri Weiser May 8, 2013. Main Message: Balance Systems to Avoid Bottlenecks. Motivation

lynton
Download Presentation

Resource Aware Scheduler – Initial Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resource Aware Scheduler – Initial Results Tomer Morad, Noam Shalev, AvinoamKolodny, IditKeidar, Uri Weiser May 8, 2013

  2. Main Message: Balance Systems to Avoid Bottlenecks • Motivation • Different programs have different resource requirements: # of cores, cache, memory bandwidth, energy, branch prediction, etc. • Hence, no computing system can be balanced • Heterogeneous systems are even worse (unbalanced) • Contention on resources wastes energy and usually degrades performance (for example: cache) • Proposal: dynamically tune the workload to the (dynamically tuned) hardware in order to minimize the contention on the resources by balancing the system • The OS scheduler can do this

  3. CMP Shared Resource Effects • Examples for shared resources: last level cache, memory bus, network bandwidth, disk bandwidth, etc. • There are three effects observed when several threads access a shared resource • Wasted Peripheral Energy (⬆Energy) • Observed when adding additional threads in a presence of a bottleneck • For example: many floating point programs running in parallel in a Niagara processor (many cores with a shared floating point unit) • Collisions (⬆ Energy, ⬇ Throughput) • Observed when several threads access a shared resources, and the requests are queued • In the example above, the service to the requests is slower • Destructive Interference (⬆ Energy, ⬇ Throughput) • Observed when threads destroy each others’ caches

  4. Resource Aware OS scheduler • Main Components: • Sampling: Sample the resource usage of the tasks that have run so that the information will be available for the prediction stage • Prediction: Predict each task’s resource usage based on the past resource usage • Scheduling: Schedule only tasks that the system has enough resources to run (idle cores are OK) • Implemented in Linux 3.2.0 • Use performance counters for sampling

  5. Memory Bandwidth – An Example • Core count is increasing • Core frequency does not decrease • Pin count is not increasing • Chip bandwidth demand is increasing, but • Chip bandwidth to memory is not increasing • We are approaching the memory bandwidth wall! • No real remedies in the near future

  6. Memory Bus Usage

  7. SPEC-CPU2006 on the baseline scheduler Instance Instances Instances Instances

  8. BW hungry program – Initial results • Implemented a resource aware scheduler in the Linux 3.2.0 • BW hungry program • 5.58 sec, 132 Joules • When run x4 times sequentially • 22.3 sec, 526 Joules • When run x4 times in parallel (4 core i5-2500) • 27.86 sec (+25%), 1368 Joules (+160%) – over sequential • Using the new scheduler with memory bandwidth limitation enforcement • 23.71 sec (+6%), 569 Joules (+8%) – over sequential • Baseline scheduler Vs Resource Aware Scheduler • 17.5% speedup, 58% energy reduction • Disclaimers: (a) Initial results; (b) energy sampled using performance counter (MSR_PKG_ENERGY_STATUS) that samples the power used by the package. Consistent results with Wattsup

  9. SPEC-CPU2006 – Initial results • Each run included four instances of identical SPEC-CPU2006 benchmarks • Average: +3.3% throughput, -3.5% energy • Notable results: • 429: +106% throughput, -43% energy • 473: +3.3% throughput, -13% energy • Out of 25 benchmarks • 16 consumed less energy (9 consumed more) • 10 ran faster (11 slower) • Other results • Energy efficiency improved on average by 11% • 15 benchmarks’ energy efficiency improved by 20% on average • 10 benchmarks’ energy efficiency degraded by 3% on average • Soft limit for the bandwidth anticipated to improve the results

More Related