190 likes | 317 Views
Automatic Misconfiguration Troubleshooting with PeerPressure. Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research. Presenter: Sara Salahi Northwestern University. Agenda. Importance of this work Key ideas PeerPressure: Architecture & Algorithm Prototype
E N D
Automatic Misconfiguration Troubleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research Presenter: Sara Salahi Northwestern University
Agenda • Importance of this work • Key ideas • PeerPressure: Architecture & Algorithm • Prototype • Performance • Future Work
Authors focus on this Importance • Tech support = 17% total cost of ownership of today’s desktop PCs • Large amount of Tech support is spent on troubleshooting • Many troubleshooting cases are due to misconfiguration • Misconfiguration is often caused by data that is in shared persistent stores (e.g. Windows registry)
Key Ideas: Misconfigurations • Can have many different “root causes” • Seemingly innocuous changes to shared system configurations • System bugs • Security patches may introduce incompatible registry settings • Failed uninstallation of applications • Manual intervention using Registry editor
Key Ideas: The Golden State • “Golden State” – a perfect configuration • Assume that the golden state is in the mass • Combine statistical golden state with Bayesian statistics to identify anomalous misconfigurations on “sick” machines
Key Ideas: Goals of Troubleshooting • Effectiveness • System should identify a small set of sick configuration candidates in a short amount of time • Automation • Minimize number of manual steps and number of users involved
3) Turns user- or machine-specific entries into canonicalized form 2) I found you 1) Sick computer 4) Database containing a number of machine configuration snapshots 5) Bayesian estimation used to calculate probability of a suspect being sick PeerPressure: Architecture
PeerPressure: Architecture • Manual Steps • User runs faulty application to record suspects • User determines if sickness is cured • Manual steps involve only the troubleshooting user and no second-party
PeerPressure: Algorithm • Intuition and Objectives • e1: Probably healthy • e2: Most probably sick • e3: “Natural biological diversity” • Type I: application configuration states • e1 and e2 • Type II: operational states (timestamps, caches etc) • e3 • Want to weed out; most likely false positives
PeerPressure: Algorithm Formulation: • (3) + (1) when m=0, P(S|V) = 1 • Bayesian estimation used to overcome this. • Vector pj: probability of event happening and its outcome being Vj; pj follows Direchtlet distribution. • mj: count of number of values matching suspect value
PeerPressure: Algorithm Asymptotic Analysis:
Prototype • GeneBank Database: Microsoft SQL Server 2000 containing snapshots from 87 Windows XP PCs • PeerPressure troubleshooter implemented in C# • “Data Sanitization” • Unification of different representations of the same value • Dual Intel Xeon 2.4 GHz CPU workstation with 1 Gb RAM hosts SQL Server
PerformanceResponse Time vs. Number of Suspects • 20 real-world troubleshooting cases used • Database queries dominate troubleshooting response time (one query per suspect entry)
Prototype: GeneBank • Registry characteristics in GeneBank • Unseen – values that are unknown to the GeneBank, increments observed cardinality by 1 • Any entry from GeneBank has cardinality of at least 2 • Entries that do no exist on some sample machines have value no entry • When cardinality is low, conformity among samples is strong
PerformanceRoot-Cause Ranking Results • 87% have cardinality of 2, 94% no more than 3, 97% no more than 4
PerformanceFalse Positives • Large cardinality of root-cause entry • Relation between root-cause entry and other entries in the suspect set • GeneBank is not pristine
PerformanceSick Machine Sensitivity Format: RootCauseRanking (NumberOfTies) / NumberOfSuspects
Future Work • Multi-gene troubleshooting • Multiple sick entries among suspects • Cross-application misconfiguration • Heavy customization of apps can break assumption of strong conformance in most configuration entries • GeneBank maintenance – privacy issue