1.18k likes | 1.19k Views
PhD Final Talk. Black Box Methods for Inferring Parallel Applications' Properties in Virtual Environments. Ashish Gupta. Committee: Prof. Peter Dinda Prof. Fabian Bustamante Prof. Yan Chen Prof. Dongyan Xu (Purdue University). March 2008. Introduction. Background.
E N D
PhD Final Talk Black Box Methods for Inferring Parallel Applications' Properties in Virtual Environments Ashish Gupta Committee: Prof. Peter Dinda Prof. Fabian Bustamante Prof. Yan Chen Prof. Dongyan Xu (Purdue University) March 2008
Background • Virtual Machine Distributed Computing • Virtuoso • Middleware for autonomic Virtual Machine distributed computing • Presents a simple abstraction for distributed computing, insulating from underlying computational, networking and middleware complexities • VMM abstracts computational resources • VNET abstracts different networking domains into one – also ideal point for monitoring • Autonomic Resource Management Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Problem Introduction • Adaptation • Resources can be heterogeneous (CPU, memory etc) • If shared, then resources availability can also highly dynamic • Application demands also change ! • Autonomic computing: • What is available ? • What is required ? • How can we effectively match the two ? • One of the major components What does the distributed application want ? • Should work with existing unmodified applications and OS Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Thesis Statement My thesis is that it is feasible to infer various useful demands and behavior of a parallel application running inside a collection of VMs to a significant degree using a black box model. To evaluate this thesis, I enumerate and define various demands and types of behavior that can be inferred, and also design, implement and evaluate ideas and approaches towards inferring these. One of the demands I infer is the communication behavior and the runtime topology of a parallel application. I also show how to infer some very useful runtime properties of a parallel application like its runtime performance, its slowdown under external load and its global bottlenecks. Significantly all of this is done using black-box assumptions and without specific assumptions about the application or the operating system. I also give evidence of how automatic black box inference can assist in adapting the application and its resource usage resulting in improved performance. Chapter 2 Chapter 3 Chapter 4 Appendices A, B Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Black box assumption and impact Black Box – Can make no assumptions about the implementation, behavior or internal state of the Guest OS/module beyond its external interface • Lowers barrier to adoption of the new inference techniques • helps deploy my work to legacy applications • Mainly accomplished by looking at external signals: • traffic, host load etc. • Not tied to Virtuoso • Other systems like softUDC [91], XenoServer [52], SODA [87], Violin [88], In-VIGO [13], VioCluster [136] can also benefit from black box application inference • Potentially any virtual distributed system Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Components of my dissertation Inference 1 2 Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance 3 4 Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Adaptation 5 6 Increasing Application Performance In Virtual Environments through Run-time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Topics I cover Inference 2 3 Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance 4 5 Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Adaptation A B Increasing Application Performance In Virtual Environments through Run-time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
BSP application model • Multiple processes executing a common kernel • Execution alternates between one or more computing phases and one or more communication phases • Combination of a schedule of computation and communication phases Super-step • A very popular model for implementing a large variety of scientific applications and parallel algorithms • Original paper: Valiant [167] Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns • Synthetic workload generator developed before. Models a BSP application • Can execute many different types of topologies common in BSP programs Some parameters Topologies Topology N-dimensional mesh Number of processors N-dimensional torus Message size N-dimensional hypercube # of iterations Binary reduction tree Flops per element All to All Memory Reads/Writes per element Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns application’s capabilities 3-D Toroid 3-D Hypercube 2-D Mesh Reduction Tree All-to-All Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
NAS parallel benchmarks • Developed by NASA [17, 172] • Set of programs to evaluate performance of parallel supercomputers • Representative of CFD applications • To generate realistic parallel application workloads • 5 kernel benchmarks: EP, MG, FT, IS, CG • Five very frequently used numeric methods Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Different contributions of my dissertation Inference 2 3 Virtual Topology and Traffic Inference Framework Black box metrics for Absolute Performance 4 5 Ball in The Court principles to compute application slowdown Global Bottlenecks using time decomposition Adaptation A B Increasing Application Performance In Virtual Environments through Run-time Inference and Adaptation Free Network Measurement For Adaptive Virtualized Distributed Computing Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Application Topology Goal of VTTIF ? Low Level Traffic Monitoring An online topology inference framework for a VM environment Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Physical Host VM VNET daemon VNET overlay network Traffic Analyzer Rate based Change detection Traffic Matrix Query Agent To other VNET daemons VM Network Scheduling Agent VTTIF Architecture Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Inferred topology Parallel Integer Sort Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
The problem • Performance of BSP applications – an important goal • Lot of work dedicated to improving performance of parallel applications, e.g. • Virtuoso • VioCluster • How do we measure performance in a black box fashion ? • The current way in Virtuoso is manual (e.g. Lin et.al. [106]) • Impact • Would enable superior adaptation algorithms • Automated evaluation and adaptation cycle • Generate reports on effectiveness of different adaptation/scheduling methods Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Cost model for BSP applications • Popular strategy: break super-step into its components: • computational cost • Communication cost of the global exchange of the data • Synchronization cost Computation cost Communication cost Number of super steps Static model of performance + requires detailed application profiling and access to source code Speed of computation in FLOPS Sync latency cost Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Super-step approach • Super-step structure an invariant • No. of steps depend on parameters and data Another possible measure of performance: number of super-steps executed per second, or the iteration rate dynamic metric Multiple super-steps for dynamic applications iteration rate is not constant Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
A new black box metric: Round Trip Iteration Rate (RIR) • Based solely on communication behavior of the application • Correlated to the iteration rate • Indirectly measures number of process interactions this indicates progress as synchronization happens at end of a super-step • Approach: Examined various properties of the traffic trace • Inter send-packet delay exhibits interesting properties Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Inter send-packet delay 176 Receive from 175 176 Send to 175 176 Send to 175 176 Send to 175 176 Receive from 175 176 Send to 175 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Inter send-packet delay Traffic trace for Patterns Message size = 4000 bytes Computation per iteration: 100 MFlops Clustering based on inter-send delays Count in cluster matches actual iteration rate output by application (325) 25 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns: clusters without load Reported: 569 Actual: 568 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns: clusters with external CPU load Reported: 153 Actual: 142 Actual execution time ratio: 3.922 Ratio from reported iteration rate: 3.72 Within 5% Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Why Does Circled Bin Correspond To Iteration Rate? • Each iteration consists of send, receive and compute phases • Number of items in large inter send-packet delay cluster represent the group that represent an inter-process interaction • Each inter-process interaction represents progress in the super-step Send packets time Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Plotting inter send-packet delay for MG • No clean clusters for a more complex application • Delays shift towards right on greater load Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Computing the RIR metric • Previous examples were for Patterns – • static performance case, easy clustering • Applications like MultiGrid from NAS benchmarks changing iteration rate For a given packet time series, 1. Count send pairs whose inter-packet delay exceeds by c * RTT 2. Send pair must be interleaved by one receive Based on BSP principles Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
RIR time series For dynamic applications, RIR changes with time Need a time series for RIR over the trace Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Outputting RIR time series - Workflow Sniff Packets Using tcpdump/libpcap for the VM traffic Send packets satisfying the two conditions Send packets that obey the conditions Sliding interval = t Sampling duration = T Slide a 1 second window over these send packets Slide by t 1 sec Get a new time series denoting RIR for each 1 second sliding window Each number represents number of iterations for a particular 1 sec window instance i1, i2, i3, i4, i5, …. in Average RIR Derivative Metrics from the above time series CDF Power Spectrum Super-phase period 32 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Representing Dynamic Performance • Define a spectrum of metrics for dynamic performance RIRavg Long term stationary average of RIR time series CDF of RIR time series indicating spread of iteration rates RIR-CDF Phase structure, periodicities, application fingerprinting, statistical scheduling RIR-PS Summary of the periodic behavior for multiple supersteps RIR-PSE Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Computing the stationary Average – sampling issues • Sliding window resolution (sampling rate) • Needs to be high enough to capture the any important high frequency behavior • Capture duration • Enough to capture the stationarity of the signal (repetition of all super-steps) • Assumption: • iteration dynamics of the application are indeed empirically stationaryfor the long run. • For a dynamic application that consist of repetitive phases, it means capturing enough of its performance behavior to capture this repetitive element. • Capturing stationarity • Power Spectrum based techniques help us determine the right sampling rate and duration • Details in dissertation Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effectiveness of RIRavg For MG application, running under different load conditions (100% and 60% CPU load), predicted execution time error rates were 13% and 7% respectively (completely black box) Value of c = 1.1 here (c*RTT factor) Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
RIR time series graph A super-step Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Can we predict slowdown of application if we put it under load ? For the IS and MG applications, which application may be hurt more if one of the processes from each application shares the physical host with an external computational load? The impact: we can now determine in advance, the impact of external load if we must choose one of these applications to be influenced by the load. • Scheduling fact: Depending the scheduling algorithm, affect of load on different RIR regions can be different (Govindan et.al. [60]) • Reason: Scheduling handled differently for CPU bound processes vs I/O bound processes • “Providing enough CPU is not enough, an equally important consideration is to provide the CPU at the right time”
Role of CDF RIR-CDF can be used to predict which application will be more affected by external load (for dynamic applications) Very useful when extra load needs to be introduced over existing applications due to demand Scheduling using the CDF: From the normal CDF, we predict a slowdown CDF based on a slowdown mapping Slowdown CDF How will the RIR-CDF of the application look like after load ? Slowdown mapping What is the slowdown for a particular RIR under a particular load ? Slowdown mapping 38 Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Other metrics • Power spectrum of RIR • Gives idea about the super-step structure of the application • Length of Super-step • A summary of the power spectrum and the significant frequencies serves as a fingerprint/super-step snapshot Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
A sample power spectrum for 4 processes Consistency across processes Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Example for MG Significant frequency separation Exec time = 19.44 seconds Number of super-phases ~4 Super-phase period = 4.1 seconds (1/0.244) Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Recap • We can deduce performance for a BSP application using black box means. • We can predict performance for an application, when imposed with load • Entirely based on Based on packet analysis • New metric called RIR : Round trip Iteration Rate • More complex metrics for dynamic applications • RIRavg • RIR-CDF • RIR Power Spectrum Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
The problem • Last chapter: Predicting performance under load (Slowdown CDF) • This chapter: Can we predict performance of application if an existing external load was removed ? • I.e. What is the slowdown of the application under the current load conditions ? Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Ball in the Court • BSP super-step computation, communication, sync • Communication according to a certain schedule and then computation in between • Each process acts (computes) on a message before sending out the next one • This acting on a message is called “Ball in the Court” (BIC) • Ball is the responsibility of the process to do some local processing and then interact with other processes , court is the local host. BIC delay Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Developing a strategy • Focus on one process • If just one process is loaded, entire application slowed down because of one process • All processes operate in sync, and iterations can only proceed if the loaded process does its duties stretched Under load Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Why BIC delay? • The traffic trace captures the behavior of the process for the entire duration • If the process slows down, the trace time length will increase. • There will be corresponding changes in the trace as well. What are those changes ? Seed of the idea Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Approach to computing the BIC delay Let’s compute the time differential for event pairs, for *.176 Some computation (6 ms) Some computation (6ms) Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
With Load, Some BIC Delays Get Larger Unloaded process Loaded process 1. Sending an ack process’s responsibility 2. This responsibility was hugely inflated in the loaded case ( 64 us to 23822 us) 3. BIC delays for other receives are similar (receive BIC for other processes) Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
An Algorithm for BIC Delay? • Can we estimate the total BIC delay from the traffic trace alone ? • Investigated this question with different approaches. • Each packet can be of the following types: • 1. Send Packet (SP) • 2. Send Ack (SA) • 3. Receive Packet (SP) • 4. Receive Ack (RA) • We can pair up consecutive packets to form event pairs SA followed by SP SA-SP • For 4 event possibilities we have 16 event pairs Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
BIC events • * - S* event pairs can be classified as BIC events • Intuitively, • process has either received a packet and is responsible to send the next one • Just sent a packet and is responsible to send the next one as well Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments