310 likes | 354 Views
Performance Measurement. Assignment? Timing. #include <sys/time.h> double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec * 1e-6); }. Paper Schedule 22 Students 6 Days Look at the schedule and email me your preference. Quickly.
E N D
Performance Measurement • Assignment? • Timing #include <sys/time.h> double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec * 1e-6); }
Paper Schedule • 22 Students • 6 Days • Look at the schedule and email me your preference. Quickly.
A Quantitative Basis for Design • Parallel programming is an optimization problem. • Must take into account several factors: • execution time • scalability • efficiency
A Quantitative Basis for Design • Parallel programming is an optimization problem. • Must take into account several factors: • Also must take into account the costs: • memory requirements • implementation costs • maintenance costs etc.
A Quantitative Basis for Design • Parallel programming is an optimization problem. • Must take into account several factors: • Also must take into account the costs: • Mathematical performance models are used to asses these costs and predict performance.
Defining Performance • How do you define parallel performance? • What do you define it in terms of? • Consider • Distributed databases • Image processing pipeline • Nuclear weapons testbed
Amdahl's Law • Every algorithm has a sequential component. • Sequential component limits speedup Maximum Speedup Sequential Component = 1/s = s
Amdahl's Law s Speedup
What's wrong? • Works fine for a given algorithm. • But what if we change the algorithm? • We may change algorithms to increase parallelism and thus eventually increase performance. • May introduce inefficiency
Metrics for Performance • Speedup • Efficiency • Scalability • Others …………..
Speedup What is Speed? Speed 1 What algorithm for Speed1? = S SpeedP What is the work performed? How much work?
Two kinds of Speedup • Relative • Uses parallel algorithm on 1 processor • Most common • Absolute • Uses best known serial algorithm • Eliminates overheads in calculation.
Speedup • Algorithm A • Serial execution time is 10 sec. • Parallel execution time is 2 sec. • Algorithm B • Serial execution time is 2 sec. • Parallel execution time is 1 sec. • What if I told you A = B?
S = E p Efficiency The fraction of time a processor spends doing useful work
Cost (Processor-Time Product) = C pT p = # processors p T = s E C
Performance Measurement • Algorithm X achieved speedup of 10.8 on 12 processors. • What is wrong? • A single point of reference is not enough! • What about asymptotic analysis?
Performance Measurement • There is not a perfect way to measure and report performance. • Wall clock time seems to be the best. • But how much work do you do? • Best Bet: • Develop a model that fits experimental results.
Parallel Programming Steps • Develop algorithm • Develop a model to predict performance • If the performance looks ok then code • Check actual performance against model • Report the performance
Performance Evaluation • Identify the data • Design the experiments to obtain the data • Report data
Performance Evaluation • Identify the data • Execution time • Be sure to examine a range of data points • Design the experiments to obtain the data • Report data
Performance Evaluation • Identify the data • Design the experiments to obtain the data • Make sure the experiment measures what you intend to measure. • Remember: Execution time is max time taken. • Repeat your experiments many times • Validate data by designing a model • Report data
Performance Evaluation • Identify the data • Design the experiments to obtain the data • Report data • Report all information that affects execution • Results should be separate from Conclusions • Present the data in an easily understandable format.
Finite Difference Example • Finite Difference Code • 512 x 512 x 5 Elements • 16 IBM RS6000 workstations • Connected via Ethernet
Finite Difference Model • Execution Time • ExTime = (Tcomp + Tcomm)/P • Communication Time • Tcomm = 2*lat + 4*bw*n*z • Computation Time • Estimate using some sample runs
What was wrong? • Ethernet • Change the computation of Tcomm • Reduce the bandwith • Tcomm = 2*lat + 4*bw*n*z*P/2