230 likes | 319 Views
The Network Weather Service. A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented By: Mohammad Al-Saeed. Organization. Introduction Motivation: why the NWS? The NWS: what is the NWS? Related work
E N D
The Network Weather Service A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented By: Mohammad Al-Saeed
Organization • Introduction • Motivation: why the NWS? • The NWS: what is the NWS? • Related work • NWS system architecture • Design goals • System components • NWS components • NWS interface • Conclusion and future work
Motivation • Searching for the environment that delivers the most • Dynamic nature of metacomputing environments • Adaptive applications • Adapt to changing environments • Knowledge needed for adaptation • Resource discovery and allocation
The Network Weather Service • A distributed system for producing short-term deliverable performance forecasts • Goal: dynamically measure and forecast the performance deliverable at the application level from a set of network resources • Measurements currently supported: • Available fraction of CPU time • End-to-end TCP connection time • End-to-end TCP network latency • End-to-end TCP network bandwidth
Related Work • TReno: performance at transport layer using TCP • Pathchar: bandwidth over a path • bprobe/cprobe: bottleneck link speed and competing traffic • Topology-d: uses ping and netperf to find bandwidth between hosts in a group then analyzes this data to find minimum-cost logical topology • ReMoS: network resource monitoring
NWS System Architecture • Design objectives • Scalability: scales to any metacomputing infrastructure • Predictive accuracy: provides accurate measurements and forecasts • Non-intrusiveness: shouldn’t load the resources it monitors • Execution longevity: available all time • Ubiquity: accessible from everywhere, monitors all resources
System Components • Four different component processes • Persistent State process: handles storage of measurements • Name Server process: directory server for the system • Sensor processes: measure current performance of different resources • Forecaster process: predicts deliverable performance of a resource during a given time
NWS Components • Persistent State Management • Naming Server • Performance Monitoring: NWS Sensors • CPU Sensor • Network Sensor • Sensor Control • Cliques: hierarchy and contention • Adaptive time-out discovery • Forecasting • Forecaster and forecasting models • Sample forecaster results
Persistent State Management • All NWS processes are stateless • The system state (measurements) are managed by the PS process: • Storage & retrieval of measurements • Measurements are time-stamped plain-text strings • Measurements are written to disk immediately and acknowledged • Measurements are stored in a circular queue of tunable size
Naming Server • Primitive text string directory service for the NWS system • The only component known system-wide • Information stored include • Name to IP binding information • Group configuration • Parameters for various processes • Each process must refresh its registration with the name server periodically • Centralized
Performance Monitoring • Actual monitoring is performed by a set of sensors • Accuracy vs. Intrusiveness • A sensor’s life: { Register with the NS; Query the NS for parameters; Generate conditional test; Forever { if conditions are met then { perform test; time-stamp results and send them to the PS refresh registration with the NS } }
CPU Sensor • Measures available CPU fraction • Testing tools: • Unix uptime: reports load average in the past x minutes • Unix vmstat: reports idle-, user- and system-time • Active probes • Accuracy: • Results assume a full priority job • Doesn’t know the priority of jobs in the queue
Active Probing Improvements Measurements produced using vmstat Measurements produced using uptime
Network Sensor • Carries network-related measurements • Testing: using active network probes • Establish and release TCP connections • Moving large (small) data to measure bandwidth (delay) • Measures connections with all peer sensors • Problems • Accuracy: depends on socket interface • Complexity: N2-N tests, collisions (contention)
Network Sensor Control • Sensors are organized into sensor sets called cliques • Each clique is configurable and has one leader • Clique sets are logical, but can be based on physical topology • Leaders are elected using a distributed election protocol • A sensor can participate in many cliques • Advantages • Scalability by organizing cliques in a hierarchy • Reduce the N2-N • Accuracy by more frequent tests
National UCSD UTenn PCL SDSC Clique Hierarchy
Contention • Each leader maintains a clique token (and time between tokens) • The sensor that has the token performs all its tests then passes the token to the next sensor in the list • Adaptive time-out discovery • Tokens have time-out field • Tokens have sequence numbers • The leader adaptively controls the time-out
Forecaster Process • A forecasting driver and a set of compile-time prediction modules • Forecasting process: • Fetching required measurements from the PS • Passing the time series to each prediction module • Choosing the best returned prediction • Incorporate sophisticated prediction techniques?
Sample Forecaster Results UC Santa Barbara – Kansas State U. Recorded Bandwidth UC Santa Barbara – Kansas State U. Forecasted Bandwidth
NWS Interface • C API • Quick short-term forecasts for applications • InitForecaster() • RequestForecasts() • CGI interface • Continuous access to NWS forecasts through the web • Interactively produces graphs for performance and forecasts • http://nws.cs.utk.edu
Conclusion and Future Work • NWS is scalable, stable and always available • NWS relies on adaptivity to achieve its design goals • NWS is open (adding sensors and forecasting models) • Current forecasting is excellent compared to powerful sophisticated forecasting techniques • Enhancements • Basing the NS on LDAP • Automatic clique configuration • Forecasting methodologies