1 / 23

The Network Weather Service

The Network Weather Service. A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented By: Mohammad Al-Saeed. Organization. Introduction Motivation: why the NWS? The NWS: what is the NWS? Related work

socorro
Download Presentation

The Network Weather Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Network Weather Service A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented By: Mohammad Al-Saeed

  2. Organization • Introduction • Motivation: why the NWS? • The NWS: what is the NWS? • Related work • NWS system architecture • Design goals • System components • NWS components • NWS interface • Conclusion and future work

  3. Motivation • Searching for the environment that delivers the most • Dynamic nature of metacomputing environments • Adaptive applications • Adapt to changing environments • Knowledge needed for adaptation • Resource discovery and allocation

  4. The Network Weather Service • A distributed system for producing short-term deliverable performance forecasts • Goal: dynamically measure and forecast the performance deliverable at the application level from a set of network resources • Measurements currently supported: • Available fraction of CPU time • End-to-end TCP connection time • End-to-end TCP network latency • End-to-end TCP network bandwidth

  5. Related Work • TReno: performance at transport layer using TCP • Pathchar: bandwidth over a path • bprobe/cprobe: bottleneck link speed and competing traffic • Topology-d: uses ping and netperf to find bandwidth between hosts in a group then analyzes this data to find minimum-cost logical topology • ReMoS: network resource monitoring

  6. NWS System Architecture • Design objectives • Scalability: scales to any metacomputing infrastructure • Predictive accuracy: provides accurate measurements and forecasts • Non-intrusiveness: shouldn’t load the resources it monitors • Execution longevity: available all time • Ubiquity: accessible from everywhere, monitors all resources

  7. System Components • Four different component processes • Persistent State process: handles storage of measurements • Name Server process: directory server for the system • Sensor processes: measure current performance of different resources • Forecaster process: predicts deliverable performance of a resource during a given time

  8. NWS Processes

  9. NWS Components • Persistent State Management • Naming Server • Performance Monitoring: NWS Sensors • CPU Sensor • Network Sensor • Sensor Control • Cliques: hierarchy and contention • Adaptive time-out discovery • Forecasting • Forecaster and forecasting models • Sample forecaster results

  10. Persistent State Management • All NWS processes are stateless • The system state (measurements) are managed by the PS process: • Storage & retrieval of measurements • Measurements are time-stamped plain-text strings • Measurements are written to disk immediately and acknowledged • Measurements are stored in a circular queue of tunable size

  11. Naming Server • Primitive text string directory service for the NWS system • The only component known system-wide • Information stored include • Name to IP binding information • Group configuration • Parameters for various processes • Each process must refresh its registration with the name server periodically • Centralized

  12. Performance Monitoring • Actual monitoring is performed by a set of sensors • Accuracy vs. Intrusiveness • A sensor’s life: { Register with the NS; Query the NS for parameters; Generate conditional test; Forever { if conditions are met then { perform test; time-stamp results and send them to the PS refresh registration with the NS } }

  13. CPU Sensor • Measures available CPU fraction • Testing tools: • Unix uptime: reports load average in the past x minutes • Unix vmstat: reports idle-, user- and system-time • Active probes • Accuracy: • Results assume a full priority job • Doesn’t know the priority of jobs in the queue

  14. Active Probing Improvements Measurements produced using vmstat Measurements produced using uptime

  15. Network Sensor • Carries network-related measurements • Testing: using active network probes • Establish and release TCP connections • Moving large (small) data to measure bandwidth (delay) • Measures connections with all peer sensors • Problems • Accuracy: depends on socket interface • Complexity: N2-N tests, collisions (contention)

  16. Network Sensor Control • Sensors are organized into sensor sets called cliques • Each clique is configurable and has one leader • Clique sets are logical, but can be based on physical topology • Leaders are elected using a distributed election protocol • A sensor can participate in many cliques • Advantages • Scalability by organizing cliques in a hierarchy • Reduce the N2-N • Accuracy by more frequent tests

  17. National UCSD UTenn PCL SDSC Clique Hierarchy

  18. Contention • Each leader maintains a clique token (and time between tokens) • The sensor that has the token performs all its tests then passes the token to the next sensor in the list • Adaptive time-out discovery • Tokens have time-out field • Tokens have sequence numbers • The leader adaptively controls the time-out

  19. Forecaster Process • A forecasting driver and a set of compile-time prediction modules • Forecasting process: • Fetching required measurements from the PS • Passing the time series to each prediction module • Choosing the best returned prediction • Incorporate sophisticated prediction techniques?

  20. Sample Forecaster Results UC Santa Barbara – Kansas State U. Recorded Bandwidth UC Santa Barbara – Kansas State U. Forecasted Bandwidth

  21. NWS Interface • C API • Quick short-term forecasts for applications • InitForecaster() • RequestForecasts() • CGI interface • Continuous access to NWS forecasts through the web • Interactively produces graphs for performance and forecasts • http://nws.cs.utk.edu

  22. Sample CGI-Generated Graph

  23. Conclusion and Future Work • NWS is scalable, stable and always available • NWS relies on adaptivity to achieve its design goals • NWS is open (adding sensors and forecasting models) • Current forecasting is excellent compared to powerful sophisticated forecasting techniques • Enhancements • Basing the NS on LDAP • Automatic clique configuration • Forecasting methodologies

More Related