Predicting Parallel Performance

Predicting Parallel Performance Introduction to Parallel Programming – Part 10

Review & Objectives • Previously: • Design and implement of a task decomposition solution • At the end of this part you should be able to: • Define speedup and efficiency • Use Amdahl’s Law to predict maximum speedup

Speedup Cores Speedup • Speedup is the ratio between sequential execution time and parallel execution time • For example, if the sequential program executes in 6 seconds and the parallel program executes in 2 seconds, the speedup is 3X Speedup curves look like this

Efficiency • Efficiency • A measure of core utilization • Speedup divided by the number of cores • Example • Program achieves speedup of 3 on 4 cores • Efficiency is 3 / 4 = 75% Efficiency Efficiency curves look like this Cores

Speedup and Efficiency Speedup Example • Painting a picket fence • 30 minutes of preparation (serial) • One minute to paint a single picket • 30 minutes of cleanup (serial) • Thus, 300 pickets takes 360 minutes (serial time)

Speedup and Efficiency Computing Speedup

Speedup and Efficiency Efficiency Example

Idea Behind Amdahl’s Law Portion of computation that will be performed sequentially Portion of computation that will be executed in parallel s ExecutionTime s 1-s s s s (1-s )/2 (1-s )/3 (1-s )/4 (1-s )/5 Cores

Derivation of Amdahl’s Law • Speedup is ratio of execution time on 1 core to execution time on p cores • Execution time on 1 core is s + (1-s) • Execution time on p cores is at least s + (1-s)/p

Amdahl’s Law Is Too Optimistic • Amdahl’s Law ignores parallel processing overhead • Examples of this overhead include time spent creating and terminating threads • Parallel processing overhead is usually an increasing function of the number of cores (threads)

Graph with Parallel Overhead Added Parallel overhead increases with # of cores ExecutionTime Cores

Other Optimistic Assumptions • Amdahl’s Law assumes that the computation divides evenly among the cores • In reality, the amount of work does not divide evenly among the cores • Core waiting time is another form of overhead Task started Working time Waiting time Task completed

Graph with Workload Imbalance Added ExecutionTime Time lost due to workload imbalance Cores

Illustration of the Amdahl Effect Linear speedup n = 100,000 Speedup n = 10,000 n = 1,000 Cores

Using Amdahl’s Law • Program executes in 5 seconds • Profile reveals 80% of time spent in function alpha, which we can execute in parallel • What would be maximum speedup on 2 cores? • New execution time ≥ 5 sec / 1.67 = 3 seconds

Superlinear Speedup • According to our general speedup formula, the maximum speedup a program can achieve on pcores is p • Superlinear speedup is the situation where speedup is greater than the number of cores used • It means the computational rate of the cores is faster when the parallel program is executing • Superlinear speedup is usually caused because the cache hit rate of the parallel program is higher

References • Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).

More General Speedup Formula

Amdahl’s Law: Maximum Speedup Assumes parallel work divides perfectly among available cores This term is set to 0

The Amdahl Effect As n   these terms dominate Speedup is an increasing function of problem size

Predicting Parallel Performance

Predicting Parallel Performance

Presentation Transcript

NCAA Basketball Tournament: Predicting Performance

Predicting Performance

Parallel Performance

Performance of Parallel Programs

Predicting Ink Performance

Parallel Performance

Predicting Movie Box Office Performance

Improving Parallel Performance

Predicting performance

TAU Parallel Performance System

High Performance Parallel Programming

Performance: Parallel Each

NCAA Basketball Tournament: Predicting Performance

Modeling Fatigue Predicting Performance

Predicting Performance

CMAQ Parallel Performance

Performance Analysis with Parallel Performance Wizard

Parallel Performance Diagnosis

Parallel Performance

TAU Parallel Performance System

Parallel Performance Diagnosis

Predicting performance