1 / 22

The Impact of Performance Asymmetry in Multicore Architectures

UW-Madison and , Intel Corp. The Impact of Performance Asymmetry in Multicore Architectures. Saisanthosh Ravi Michael Konrad. Balakrishnan Rajwar Upton Lai. 32 nd Annual International Symposium on Computer Architecture. F. F. S. S. Performance asymmetry.

heaton
Download Presentation

The Impact of Performance Asymmetry in Multicore Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UW-Madisonand , Intel Corp. The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad Balakrishnan Rajwar Upton Lai 32nd Annual International Symposium on Computer Architecture

  2. F F S S Performance asymmetry ... difference in compute power of processors • Architectural differences • Micro-architectural parameters • Other • Heat: Thermal throttling Why need asymmetry now? • CMP/ Many cores as commodity systems • Run variety of workloads • Good serial performance and high throughput • Optimal energy consumption Assume an asymmetric multicore system

  3. N procs. Same config. F F Performance S S Same/Many Runs Stable? N procs. Diff configs. Performance S S Need to utilize asymmetry. perform better Need predictable and robust performance S S Compute power Scalable? Asymmetry & MT workloads

  4. Algorithm, Correctness, Thread Partitioning The problems Programmers Don’t reason about asymmetry Characteristics of threads Partitioning, Synchronization barriers, Interference, Lifetime Scheduling of threads OS Kernel, Library, Application, DB/Web servers, Managed runtime systems (Java, .NET)

  5. Contributions Asymmetry negatively affects applications - Studied many workloads on real hardware - Observed unpredictable workload behavior This can be fixed by - Evaluating threads’ work partitioning • Scheduling of threads with asymmetry

  6. Outline Asymmetry and Performance Evaluation Methodology Asymmetric Configurations Workloads and Results

  7. Evaluation methodology Asymmetry in real hardware - Intel 4-way 3-GHz Xeon - Different cores run at different frequencies - Software controlled Benefits - Long real-time runs (no simulations) - Workloads are setup according to specs - Representative of other forms of asymmetry - Communication - Micro-architecture etc.

  8. F F F F F F S S F F S S S S F S F S S S Configurations all fast all slow 1 slow 2 slow 3 slow Symmetric Asymmetric F = Full frequency S = one-eighth of Full frequency (in talk and paper) S = one-fourth of Full frequency (in paper)

  9. Studying impact Scalability Stability Perf. Metric Perf. Metric Same or Many runs 2 slow 1 slow 3 slow all slow all fast (Asymm)

  10. Workloads evaluated SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Middle-tier business apps. Throughput parallel Webservers Throughput parallel Task-based parallelization Embarrassingly parallel

  11. Fix P P O P P Impact of asymmetry SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Workloads Scalable Stable O P P P P O O O P O O O P P P P

  12. Workloads SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Managed runtime system (BEA JRockit & Sun HotSpot) Windows 2003 and Linux 2 GCs- Parallel and Gen. Concurrent. Only Minor GC Upto 20 threads Minimal communication

  13. SPECjbb P O Scalable? Stable? with kernel fix • Fix: Kernel scheduler moves jobs from slow to fast if free Stability (JRockit/Gencon GC) on 2 slow 4 runs • Problem: Interference from runtime system (JVM, GC)

  14. Workloads SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Webserver on Linux Thread-based vs. Event-based model ApacheBench Raw perf. with static page Light and heavy loads

  15. P O Scalable? Stable? Apache Scalability & Stability (light load) • Problem:light load - threads can be on fast/slow • No issues under heavy load • Fixes:Kernel scheduler or shorter lifetime of threads

  16. O O Scalable? Stable? Zeus Scalability & Stability • Under heavy and light loads: unpredictable • Superior perf. on symmetric configs. • Problem: Aggressive application-level scheduling

  17. Workloads SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake OMP: Scientific app. Loop-based parallelization Intel Fortran,OpenMP on Linux H.264: Media encoding OpenMP on Windows 2003 PMake: Parallel Make of Linux Kernel

  18. O O Scalable? Stable? with app. fix • Fix:Change scheduling of tasks to on-demand • Downside:Overheads SPECOMP Scalability • OpenMP schedules tasks assuming equal perf. procs. • Problem: Fast processors are held by slow

  19. P P Scalable? Stable? PMake • PMake linearly scalable on all configurations H.264 & PMake H.264 • H.264 slows down significantly with 1 slow proc. • Speeds up with 1 fast proc.

  20. Scalable Stable Fix O P P P P Kernel fix P O P O O O App. fix P O P O O P P P P P Impact of asymmetry SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Interference from runtime system. Garbage collector dependent. Concurrent GC causes more problems. Migrate tasks from slow to fast core if one is free. Inspect runtime software, interference between threads (GC). Migrate tasks from slow to fast core if one is free. Or, Handle few requests and recycle threads. High overhead, low perf. Robust, multi-tier application. Feedback tunes the workload. Very responsive to interference, small heaps etc. Query parallelization not aware of asymm. Intra-query parallelization worsens stability. Superior perf. in symmetric system Unpredictable on asymm. with heavy and light loads. Independent application scheduling OpenMP based parallelization with sync. barriers. Fast cores held by slow. Thread serves many requests to reduce overheads. Problems with light load. Threads can map to fast or slow proc. Reconsider application scheduling Approx. application change by reducing degree of Parallelization. Fix application scheduler. Consider asymm. in query optimization engine. Robust application. Heavy utilization. Threads well-balanced and abundant. Assign tasks on-demand instead of up-front. Make OpenMP understand asymm. Multi-programming with several tasks.

  21. Conclusions Asymmetric systems - Good for energy and performance - But can introduce unpredictability Software to understand asymmetry - Evaluate application’s work partitioning - Scheduling of tasks. Mostly no other changes. - May be, feedback based Suitable asymmetry - Many slow & few fast processors

  22. Questions?

More Related