1 / 19

Automatic Misconfiguration Troubleshooting with PeerPressure

Automatic Misconfiguration Troubleshooting with PeerPressure. Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research. Presenter: Sara Salahi Northwestern University. Agenda. Importance of this work Key ideas PeerPressure: Architecture & Algorithm Prototype

ipo
Download Presentation

Automatic Misconfiguration Troubleshooting with PeerPressure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Misconfiguration Troubleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research Presenter: Sara Salahi Northwestern University

  2. Agenda • Importance of this work • Key ideas • PeerPressure: Architecture & Algorithm • Prototype • Performance • Future Work

  3. Authors focus on this Importance • Tech support = 17% total cost of ownership of today’s desktop PCs • Large amount of Tech support is spent on troubleshooting • Many troubleshooting cases are due to misconfiguration • Misconfiguration is often caused by data that is in shared persistent stores (e.g. Windows registry)

  4. Key Ideas: Misconfigurations • Can have many different “root causes” • Seemingly innocuous changes to shared system configurations • System bugs • Security patches may introduce incompatible registry settings • Failed uninstallation of applications • Manual intervention using Registry editor

  5. Key Ideas: The Golden State • “Golden State” – a perfect configuration • Assume that the golden state is in the mass • Combine statistical golden state with Bayesian statistics to identify anomalous misconfigurations on “sick” machines

  6. Key Ideas: Goals of Troubleshooting • Effectiveness • System should identify a small set of sick configuration candidates in a short amount of time • Automation • Minimize number of manual steps and number of users involved

  7. 3) Turns user- or machine-specific entries into canonicalized form 2) I found you  1) Sick computer  4) Database containing a number of machine configuration snapshots 5) Bayesian estimation used to calculate probability of a suspect being sick PeerPressure: Architecture

  8. PeerPressure: Architecture • Manual Steps • User runs faulty application to record suspects • User determines if sickness is cured • Manual steps involve only the troubleshooting user and no second-party

  9. PeerPressure: Algorithm • Intuition and Objectives • e1: Probably healthy • e2: Most probably sick • e3: “Natural biological diversity” • Type I: application configuration states • e1 and e2 • Type II: operational states (timestamps, caches etc) • e3 • Want to weed out; most likely false positives

  10. PeerPressure: Algorithm Formulation: • (3) + (1)  when m=0, P(S|V) = 1 • Bayesian estimation used to overcome this. • Vector pj: probability of event happening and its outcome being Vj; pj follows Direchtlet distribution. • mj: count of number of values matching suspect value

  11. PeerPressure: Algorithm Asymptotic Analysis:

  12. Prototype • GeneBank Database: Microsoft SQL Server 2000 containing snapshots from 87 Windows XP PCs • PeerPressure troubleshooter implemented in C# • “Data Sanitization” • Unification of different representations of the same value • Dual Intel Xeon 2.4 GHz CPU workstation with 1 Gb RAM hosts SQL Server

  13. PerformanceResponse Time vs. Number of Suspects • 20 real-world troubleshooting cases used • Database queries dominate troubleshooting response time (one query per suspect entry)

  14. Prototype: GeneBank • Registry characteristics in GeneBank • Unseen – values that are unknown to the GeneBank, increments observed cardinality by 1 • Any entry from GeneBank has cardinality of at least 2 • Entries that do no exist on some sample machines have value no entry • When cardinality is low, conformity among samples is strong

  15. PerformanceRoot-Cause Ranking Results • 87% have cardinality of 2, 94% no more than 3, 97% no more than 4

  16. PerformanceFalse Positives • Large cardinality of root-cause entry • Relation between root-cause entry and other entries in the suspect set • GeneBank is not pristine

  17. PerformanceImpact of Sample Set Size

  18. PerformanceSick Machine Sensitivity Format: RootCauseRanking (NumberOfTies) / NumberOfSuspects

  19. Future Work • Multi-gene troubleshooting • Multiple sick entries among suspects • Cross-application misconfiguration • Heavy customization of apps can break assumption of strong conformance in most configuration entries • GeneBank maintenance – privacy issue

More Related