1 / 20

CompSci 296.2 Self-Managing Systems

CompSci 296.2 Self-Managing Systems. Shivnath Babu. Today. Project schedule (reminder) Finish QueS presentation System, challenges Sample projects If we have time, start ROC discussion. Project. Group size <= 2 Identify “general topic” by end of January, meet Shivnath

kstacy
Download Presentation

CompSci 296.2 Self-Managing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CompSci 296.2 Self-Managing Systems Shivnath Babu

  2. Today • Project schedule (reminder) • Finish QueS presentation • System, challenges • Sample projects • If we have time, start ROC discussion

  3. Project • Group size <= 2 • Identify “general topic” by end of January, meet Shivnath • Feb 7: Scope the problem, give 15-minute talk • Feb 21: 3-minute talk • March 7: 15-minute talk • March 28: 3-minute talk • April 4/6: 15-minute talk • April 20/24: 15-minute final in-class presentation (+ “demo”)

  4. Querying Systems as Data • What are probable causes of the Service-Level-Agreement (SLA) violations rising to 12%? Root-cause query

  5. Queries: What if … • Given today’s workload, how will average response time change if my database fails? • If I double the memory on my application servers, how will SLA violation rate change?

  6. Queries: Let me know … • Let me know if, with 75% probability, average response time will exceed 5 seconds in next 30 minutes • Prediction • Continuous query

  7. Queries: What should I do? • What should I do to reduce SLA violations of requests A to <1%, without increasing violations of other requests? • Root-cause + What-if

  8. D A T A Querying Systems as Data • Instrumented traces, logs • System activity data • Data from active probing • Workload • System configuration data (e.g., buffer size, indexes) • Source code • Models • Analytic performance models • Machine learning models • Rules from system experts • Simulators

  9. System mgmt. services D A T A Queries Model- driven DB Engine Data Maintenance Answers Query Processor Data Acquisition Querying Systems with QueS (30,000 ft)

  10. Challenges: Query Complexity • Support for complex queries • Rank probable causes of SLA violation rising to 12%? • “What should I do” queries • Queries may be acquisitional

  11. Challenges: Query Specification • Declarative query language • Expressibility of language • Composition • Snapshot queries and continuous queries

  12. Challenges: Query Processing • Model-based query processing • Many types of data sources • Structured, semi-structured, and unstructured • Uncertainty in input data • E.g., legacy systems may have partial/no instrumentation • Imprecise answers • Answers may include quantification of accuracy • Ranking

  13. Challenges: Run-time Overhead • Real-time service for 24x7 systems • Tunable data acquisition • Active probing

  14. Sample Projects • NIMO • Fa • What-if querying for database systems • Combining structured & unstructured data • Projects using Nagios • Projects using IBM software

  15. Sample Project (in progress) • NIMO (Piyush Shivam) • Answering queries about: • Expected performance given a resource assignment • Feasible resource assignments to meet SLA • What-if queries for applications in network utilities

  16. Sample Project (in progress) • Fa (Songyun Duan) • Can we automate problem-prediction and diagnosis? • Use of Bayesian Networks for: • Predicting performance problems (continuous query) • Root-cause queries

  17. Sample Project • What-if queries on database configuration-parameter settings • Ex: What happens to transaction response times if I change value of parameter X from v to v’

  18. Sample Project • Combined querying of structured and unstructured system data • Structured data: MySQL performance counters, processor utilization, number of I/O accesses • Unstructured data: Application and system logs • Interested: Hao He

  19. Sample Project • Add problem-prediction capability to Nagios • Add root-cause querying to Nagios • Similar projects using the IBM Autonomic Computing Toolkit + ABLE framework • Ex: Wrap them inside a query interface

  20. Projects at HP Research • Project 1: Predicting performance problems, finding root causes of problems • Project 2: Debugging complex systems • Project 3: Designing adaptive systems (using control theory)

More Related