310 likes | 362 Views
Explore optimization in parallel programming considering factors like execution time, scalability, efficiency, and costs using mathematical performance models. Learn about Amdahl's Law limitations, speedup metrics, efficiency calculations, and performance evaluation methods in this comprehensive guide.
E N D
Performance Measurement • Assignment? • Timing #include <sys/time.h> double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec * 1e-6); }
Paper Schedule • 22 Students • 6 Days • Look at the schedule and email me your preference. Quickly.
A Quantitative Basis for Design • Parallel programming is an optimization problem. • Must take into account several factors: • execution time • scalability • efficiency
A Quantitative Basis for Design • Parallel programming is an optimization problem. • Must take into account several factors: • Also must take into account the costs: • memory requirements • implementation costs • maintenance costs etc.
A Quantitative Basis for Design • Parallel programming is an optimization problem. • Must take into account several factors: • Also must take into account the costs: • Mathematical performance models are used to asses these costs and predict performance.
Defining Performance • How do you define parallel performance? • What do you define it in terms of? • Consider • Distributed databases • Image processing pipeline • Nuclear weapons testbed
Amdahl's Law • Every algorithm has a sequential component. • Sequential component limits speedup Maximum Speedup Sequential Component = 1/s = s
Amdahl's Law s Speedup
What's wrong? • Works fine for a given algorithm. • But what if we change the algorithm? • We may change algorithms to increase parallelism and thus eventually increase performance. • May introduce inefficiency
Metrics for Performance • Speedup • Efficiency • Scalability • Others …………..
Speedup What is Speed? Speed 1 What algorithm for Speed1? = S SpeedP What is the work performed? How much work?
Two kinds of Speedup • Relative • Uses parallel algorithm on 1 processor • Most common • Absolute • Uses best known serial algorithm • Eliminates overheads in calculation.
Speedup • Algorithm A • Serial execution time is 10 sec. • Parallel execution time is 2 sec. • Algorithm B • Serial execution time is 2 sec. • Parallel execution time is 1 sec. • What if I told you A = B?
S = E p Efficiency The fraction of time a processor spends doing useful work
Cost (Processor-Time Product) = C pT p = # processors p T = s E C
Performance Measurement • Algorithm X achieved speedup of 10.8 on 12 processors. • What is wrong? • A single point of reference is not enough! • What about asymptotic analysis?
Performance Measurement • There is not a perfect way to measure and report performance. • Wall clock time seems to be the best. • But how much work do you do? • Best Bet: • Develop a model that fits experimental results.
Parallel Programming Steps • Develop algorithm • Develop a model to predict performance • If the performance looks ok then code • Check actual performance against model • Report the performance
Performance Evaluation • Identify the data • Design the experiments to obtain the data • Report data
Performance Evaluation • Identify the data • Execution time • Be sure to examine a range of data points • Design the experiments to obtain the data • Report data
Performance Evaluation • Identify the data • Design the experiments to obtain the data • Make sure the experiment measures what you intend to measure. • Remember: Execution time is max time taken. • Repeat your experiments many times • Validate data by designing a model • Report data
Performance Evaluation • Identify the data • Design the experiments to obtain the data • Report data • Report all information that affects execution • Results should be separate from Conclusions • Present the data in an easily understandable format.
Finite Difference Example • Finite Difference Code • 512 x 512 x 5 Elements • 16 IBM RS6000 workstations • Connected via Ethernet
Finite Difference Model • Execution Time • ExTime = (Tcomp + Tcomm)/P • Communication Time • Tcomm = 2*lat + 4*bw*n*z • Computation Time • Estimate using some sample runs
What was wrong? • Ethernet • Change the computation of Tcomm • Reduce the bandwith • Tcomm = 2*lat + 4*bw*n*z*P/2