190 likes | 282 Views
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS. Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston. Resource Selection for Network/Grid Applications. Model. Data. GUI. Sim 1. Pre. Stream. Application. ?. where is the best performance . Network.
E N D
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston
Resource Selection for Network/Grid Applications Model Data GUI Sim 1 Pre Stream Application ? where is the best performance Network
Current approaches to Node Selection Model Data GUI Sim 1 Pre Stream • 1. Measure and model network properties, such as available bandwidth and CPU loads (with tools like NWS) • 2. Find “best” nodes for execution based on network status • But expected application performance based on measured resource status may not be accurate • depends on application characteristics – hard to model • translation, e.g., unused bandwidth vs expected throughput • data may be stale as frequent measurements are expensive
Our Approach Application Model Data GUI Sim 1 Pre Stream Network PREDICT APPLICATION PERFORMANCE BY RUNNING A SMALL PROGRAM REPRESENTATIVE OF ACTUAL DISTRIBUTED APPLICATION
Performance Skeleton Performance Skeleton is a synthetic short running program whose execution characteristics mirror the application it represents An application and its skeleton have similar • communication pattern • CPU usage • memory usage • synchronization pattern Goal: Performance of a skeleton is directly related to the performance of the application under any condition • e.g., a skeleton executes in .1% of the time the application takes to execute on any part of a shared network
Central Contribution of This Paper Model Data GUI Sim 1 Pre Stream CREATE SKELETON Application Skeleton Model Data GUI Sim 1 Pre Stream Framework for Automatic Construction of Performance Skeletons
Automatic Construction of Skeletons Model Data GUI Sim 1 Pre Stream CREATE SKELETON Application Skeleton Model Data GUI Sim 1 Pre Stream Construct skeleton program from execution signature Record Execution Trace Compress execution trace into execution signature
Model Data GUI Sim 1 Pre Stream CREATE SKELETON Automatic Construction of Skeletons Application Skeleton Model Data GUI Sim 1 Pre Stream Construct skeleton program from execution signature Record Execution Trace Compress execution trace into execution signature
Recording of Execution Trace • Implemented for MPI applications • Link MPI application with PMPI based profiling library • no source code modification / analysis required • Execute on a dedicated testbed • Records all MPI function calls • Call name, start time, stop time, parameters passed • Timing done to microsecond granularity • CPU busy = time between two consecutive MPI calls
Model Data GUI Sim 1 Pre Stream CREATE SKELETON Automatic Construction of Skeletons Application Skeleton Model Data GUI Sim 1 Pre Stream Construct skeleton program from execution signature Record Execution Trace Compress execution trace into execution signature
Generation of Execution Signature …1 Application execution typically follows cyclic patterns Goal: Determine cyclic patterns and form loop structure by identifying repeating execution behavior. • Repeating patterns should be broadly similar Step 1:Execution trace to symbol strings • Cluster similar execution events • Replace all events in cluster by average event • Each cluster is then assigned a unique symbol • Execution trace is replaced by string of symbols: ,,,,,,,,,,, , ,,, , ,,, …
Generation of Execution Signature …2 Step 2: Compress string by Identifying Cycles • Similar to longest substring matching problem • Algorithm builds loop structure recursively from symbol strings e.g. ,,,,,,,,,,, , ,,, , ,,, isreplaced by [,,]4,[,[]2,]2 • Typically signature is multiple orders of magnitude smaller than trace Step 3: Adaptively increase degree of clustering • until signature is compact enough
Model Data GUI Sim 1 Pre Stream CREATE SKELETON Automatic Construction of Skeletons Application Skeleton Model Data GUI Sim 1 Pre Stream Construct skeleton program from execution signature Record Execution Trace Compress execution trace into execution signature
Generate Performance Skeleton Program Goal:Execution time of performance skeleton should be a fixed factor K less than application execution time Reduce Iterations of each loop by a factor K • Add remainder iterations to events outside of all loops Process events outside loop as follows: • Reduce execution time of compute operations by a factor K • Reduce execution time of message exchanges by reducing bytes exchanged by a factor K • Communication operations not scaled linearly due to latency. • Considering latency would make approach architecture-specific Replace symbols by C language statements
Experimental Validation Skeletons constructed for Class B NAS MPI benchmarks are executed in following sharing scenarios • Competing processes on one node • Competing processes on all nodes • Competing traffic on one link • Competing traffic on all links • Competing process and traffic on one node and link Skeleton execution time is used to predict application execution time. Setup: Intel Xeon dual CPU 1.7 GHz nodes running Linux 2.4.7. Gigabit crossbar switch. iproute to simulate link sharing
Prediction Accuracy Graph shows error between predicted and measured application execution time Skeleton execution is 1/10th of Application execution average error: 6% max error 18% Error is higher for scenarios with competing traffic
Comparison with other methods Average Prediction: Average slowdown of entire benchmark is used to predict execution time for each program. Class S Prediction: Class S benchmark(~1sec) programs used as skeletons for Class B (30-900s)benchmarks
Preliminary Conclusions Performance estimation with skeleton has high accuracy Need to incorporate memory access patterns and fine grain CPU behavior for execution across architectures Implementation limited to mpi applications • basic approach should work for other paradigms Skeletons may have other uses as a fast way of estimating application performance • e.g. on a slow simulated future system
Questions Contact jaspal@uh.edu ssodhi@microsoft.com