210 likes | 246 Views
CompSci 296.2 Self-Managing Systems. Shivnath Babu. Today. Project schedule (reminder) Finish QueS presentation System, challenges Sample projects If we have time, start ROC discussion. Project. Group size <= 2 Identify “general topic” by end of January, meet Shivnath
E N D
CompSci 296.2 Self-Managing Systems Shivnath Babu
Today • Project schedule (reminder) • Finish QueS presentation • System, challenges • Sample projects • If we have time, start ROC discussion
Project • Group size <= 2 • Identify “general topic” by end of January, meet Shivnath • Feb 7: Scope the problem, give 15-minute talk • Feb 21: 3-minute talk • March 7: 15-minute talk • March 28: 3-minute talk • April 4/6: 15-minute talk • April 20/24: 15-minute final in-class presentation (+ “demo”)
Querying Systems as Data • What are probable causes of the Service-Level-Agreement (SLA) violations rising to 12%? Root-cause query
Queries: What if … • Given today’s workload, how will average response time change if my database fails? • If I double the memory on my application servers, how will SLA violation rate change?
Queries: Let me know … • Let me know if, with 75% probability, average response time will exceed 5 seconds in next 30 minutes • Prediction • Continuous query
Queries: What should I do? • What should I do to reduce SLA violations of requests A to <1%, without increasing violations of other requests? • Root-cause + What-if
D A T A Querying Systems as Data • Instrumented traces, logs • System activity data • Data from active probing • Workload • System configuration data (e.g., buffer size, indexes) • Source code • Models • Analytic performance models • Machine learning models • Rules from system experts • Simulators
System mgmt. services D A T A Queries Model- driven DB Engine Data Maintenance Answers Query Processor Data Acquisition Querying Systems with QueS (30,000 ft)
Challenges: Query Complexity • Support for complex queries • Rank probable causes of SLA violation rising to 12%? • “What should I do” queries • Queries may be acquisitional
Challenges: Query Specification • Declarative query language • Expressibility of language • Composition • Snapshot queries and continuous queries
Challenges: Query Processing • Model-based query processing • Many types of data sources • Structured, semi-structured, and unstructured • Uncertainty in input data • E.g., legacy systems may have partial/no instrumentation • Imprecise answers • Answers may include quantification of accuracy • Ranking
Challenges: Run-time Overhead • Real-time service for 24x7 systems • Tunable data acquisition • Active probing
Sample Projects • NIMO • Fa • What-if querying for database systems • Combining structured & unstructured data • Projects using Nagios • Projects using IBM software
Sample Project (in progress) • NIMO (Piyush Shivam) • Answering queries about: • Expected performance given a resource assignment • Feasible resource assignments to meet SLA • What-if queries for applications in network utilities
Sample Project (in progress) • Fa (Songyun Duan) • Can we automate problem-prediction and diagnosis? • Use of Bayesian Networks for: • Predicting performance problems (continuous query) • Root-cause queries
Sample Project • What-if queries on database configuration-parameter settings • Ex: What happens to transaction response times if I change value of parameter X from v to v’
Sample Project • Combined querying of structured and unstructured system data • Structured data: MySQL performance counters, processor utilization, number of I/O accesses • Unstructured data: Application and system logs • Interested: Hao He
Sample Project • Add problem-prediction capability to Nagios • Add root-cause querying to Nagios • Similar projects using the IBM Autonomic Computing Toolkit + ABLE framework • Ex: Wrap them inside a query interface
Projects at HP Research • Project 1: Predicting performance problems, finding root causes of problems • Project 2: Debugging complex systems • Project 3: Designing adaptive systems (using control theory)