380 likes | 545 Views
Network Weather Service. Sathish Vadhiyar. Sources / Credits: NWS web site: http://nws.cs.ucsb.edu NWS papers. Introduction. “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing resources”
E N D
Network Weather Service Sathish Vadhiyar • Sources / Credits: • NWS web site: http://nws.cs.ucsb.edu • NWS papers
Introduction • “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing resources” • What will be the future load (not current load) when a program is executed? • Producing short-term performance forecasts based on historical performance measurements • The forecasts can be used by dynamic scheduling agents
Introduction • Resource allocation and scheduling decisions must be based on predictions of resource performance during a timeframe • NWS takes periodic measurements of performance and using numerical models, forecasts resource performance
NWS Goals • Components • Persistent state • Name server • Sensors • Passive (CPU availability) • Active (Network measurements) • Forecaster
Performance measurements • Using sensors • CPU sensors • Measures CPU availability • Uses • uptime • vmstat • Active probes • Network sensors • Measures latency and bandwidth • Each host maintains • Current data • One-step ahead predictions • Time series of data
Issues with Network Sensors • Appropriate transfer size for measuring throughput • Collision of network probes • Solutions • Tokens and hierarchical trees with cliques
Available CPU measurement • The formulae shown does not take into account job priorities • Hence periodically an active probe is run to adjust the estimates
Predictions • To generate a forecast, forecaster requests persistent state data • When a forecast is requested, forecaster makes predictions for existing measurements using different forecast models • Dynamic choice of forecast models based on the best Mean Absolute Error, Mean Square Prediction Error, Mean Percentage Prediction Error • Forecasts requested by: • InitForecaster() • RequestForecasts() • Forecasting methods • Mean-based • Median based • Autoregressive
Forecasting Methods Notations: Prediction Accuracy: Mean Absolute Error (MAE) is the average of the above Prediction Method:
Forecasting Methods – Mean-based 1. 2. 3.
Forecasting Methods – Median-based 1. 2. 3.
Autoregression 1. ai found such that it minimizes the overall error. ri ,j is the autocorellation function for the series of N measurements.
Forecasting Complexity vs Accuracy • Semi Non-parametric Time Series Analysis (SNP) – an accurate but complicated model • Model fit using iterative search • Calculation of conditional expected value using conditional probability density
Sensor Control • Each sensor connects to other sensors and perform measurements O(N2) • To reduce the time complexity, sensors organized in hierarchy called cliques • To avoid collisions, tokens are used • Adaptive control using adaptive token timeouts • Adaptive time-out discovery and distributed leader election protocol
Synchronizing network probes • Consistent periodicity and mutual exclusion • Token • List of hosts to probe • Periodicity of probe • Parameters to the probe • Sequence number • Leader initiates the token • A hosts after receiving a token: • Conducts probes with the other hosts in the token • Passes the token to the next host • Token passed back to the leader
Contd… • Leader notes the token circuit time and calculates the next token initiation time as (desired periodicity – token circuit time) • To avoid long delays in token circulation and to have fault tolerance: • Each host maintains a timer • When the timer times out, the host declares itself as the leader and initiates a new token • When a host encounters two tokens, the old token is destroyed • Calculation of time-outs • Each host records token circuit time, variance of the time • Uses NWS forecasting models to predict the next token arrival time
New Protocol • Compromise between periodicity and mutual exclusion • NWS administrator specifies periodicity, and an upper range of desired periodicity • If network conditions are stable and if tokens are received within the upper range, then mutual exclusion is guaranteed • If not, hosts times out and start conducting probes with possible collisions • Thus the protocol switches between good and bad phases
Comparison of 2 protocols – Experimental setup • 4 machines – 2 in Lyon, France and 2 in Tennessee, USA • 240 second periodicity • 5 second range
Use of NWS: Scheduling a Jacobi application The problem: Appropriate partitioning strategy to balance processor efficiencies and communication overheads, i.e. deriving partitions to obtain resource performance
Deriving Partitions for Jacobi • Notations • Per-processor execution time • The goal
Deriving Partitions for Jacobi • Communication time • Soultion: system of linear equations by Gaussian Elimination
References • Implementing a Performance Forecasting System for Metacomputing: The Network Weather Service. Rich Wolski, Neil Spring, Chris Peterson, in Proceedings of SC97, November, 1997. • Dynamically Forecasting Network Performance Using the Network Weather Service. Rich Wolski, in Journal of Cluster Computing, Volume 1, pp. 119-132, January, 1998. • The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Rich Wolski, Neil Spring, and Jim Hayes, Journal of Future Generation Computing Systems,Volume 15, Numbers 5-6, pp. 757-768, October, 1999.
References • Synchronizing Network Probes to avoid Measurement Intrusiveness with the Network Weather Service, B. Gaidioz, R. Wolski, and B. Tourancheau, Proceedings of 9th IEEE High-performance Distributed Computing Conference, August, 2000, pp. 147-154. • Experiences with Predicting Resource Performance On-line in Computational Grid Settings, Rich Wolski, ACM SIGMETRICS Performance Evaluation Review, Volume 30, Number 4, pp 41--49, March, 2003.