310 likes | 425 Views
CS598YYZ Paper Presentation. Automatic Misconfiguration Tourbleshooting with PeerPressure. Helen J. Wang, John C. Platt, Yu Chen Ruyun Zhang, Yi-MinWang OSDI 2004, San Francisco, CA Presentor: Xiao Ma ( xiaoma2@cs.uiuc.edu ).
E N D
CS598YYZ Paper Presentation Automatic Misconfiguration Tourbleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen Ruyun Zhang, Yi-MinWang OSDI 2004, San Francisco, CA Presentor: Xiao Ma (xiaoma2@cs.uiuc.edu) *Borrowed some tables and figures from the author’s slides in OSDI 2004.
CS598YYZ (Fall 2005) Who am I? Hey all! My name is Xiao Ma, a first-year Ph.D. student in the department of Computer Science at UIUC. I’m a member of Opera group. Now I’m looking for my favorite topics in system architecture and operating system. • Questions? Raise hand! • Wrong saying? Point out! CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Outline 1. Authors and Background 2. Motivation, Goals and Assumptions 3. PeerPressure Troubleshooting 4. Evaluation 5. Related Work 6. Concluding Remarks 7. Shortcomings and Future Work CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Authors Helen J. Wang Researcher in the Systems & Networking Research Group, Microsoft Research. Ph.D. CS, U.C. Berkeley (2001) John C. Platt, Senior Researcher in the Knowledge Tools Group at Microsoft Research Ph.D. CS, Caltech (1909) Yi-Min Wang Senior Researcher, Systems and Networking Research Area, Microsoft Research Ph.D. ECE, UIUC (1993) CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Background • This paper is from a Microsoft research project, PeerPressure & Friends Troubleshooting Network (FTN), which is mainly organized by Helen Wang. Other related publications are: • Automatic Misconfiguration Troubleshooting with PeerPressureUsenix OSDI 2004 • Friends Troubleshooting Network: Towards Privacy-Preserving, Automatic TroubleshootingIPTPS 2004 • Privacy-Preserving Friends Troubleshooting NetworkISOC NDSS 2005 • Applications of Secure Electronic Voting to Automated Privacy-Preserving TroubleshootingACM CCS, November, 2005 CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Outline 1. Authors and Background 2. Motivation, Goals and Assumptions 3. PeerPressure Troubleshooting 4. Evaluation 5. Related Work 6. Concluding Remarks 7. Shortcomings and Future Work CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Motivation • Technical support contributes 17% of the total cost of ownership (TCO) of today’s desktop PCs [Tolly2000] • Much of application malfunctioning comes from misconfigurations • Conflicting changes of the shared configuration data • Incomplete uninstallation of applications • …… • Configuration data could be very huge and complex • Windows: typically 200,000 entries in the Registry (Win XP) • Unix-like system: a huge number of various configuration files • Automatic diagnosis can save significant cost and kill some pains CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Goals • Effectiveness • Accurately catch the root cause in a reasonable time • Narrow down to only a small number of suspects in a reasonable time • Automation • minimize the human being’s workload • no need to manually identify the correct state (such as for STRIDER) CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Assumptions • The golden state is in the mass • An application functions well on most of machines • Malfunctioning is anomaly • Single-entry problem • The malfunctioning is caused by one single entry • Multi-entry problem is left for future work • PeerPressure needs uses’ help • Users know how to reproduce the failed execution • Users can understand the replied ranking from PeerPressure and can modify the guilty entries. CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Outline 1. Authors and Background 2. Motivation, Goals and Assumptions 3. PeerPressure Troubleshooting 4. Evaluation 5. Related Work 6. Concluding Remarks 7. Shortcomings and Future Work CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Design Principles • Narrow down, narrow down, and narrow down again • only choose those referenced entries as suspects (App Tracer) • the cardinality (the number of possible sample values for a suspect entry) is low • analyze and rank the suspects • Rank EOIs using statistics from a sample set of machines • draw samples from other machines’ configuration state • use Bayesian estimation to calculate the guilt probability • rank EOIs by guilt probability CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) PeerPressure Structure P-2-P Troubleshooting Community CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Statistical Analysis Goals • Is e1 a misconfiguration entry? -- Very likely • Is e2 a misconfiguration entry? -- Very unlikely • Is e3 a misconfiguration entry? -- Hard to say • The ranking is: e1 > e3 > e2 Goal 1: Capture and rank the anomaly EOIs. CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Statistical Analysis Goals (cont’d) • Is e1 an application/os configuration state? -- Yes • Is e2 an application/os configuration state? -- Yes • Is e3 an application/os configuration state? -- No, it’s an operational state • e3 is less likely to be the root cause Goal 2: Weed out operational state entries. CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Statistical Analyzer Model N + c P(S|V) = N + ct + cm ( t – 1 ) • N – the number of examples • t – the number of EOIs • c – cardinality (the number of possible sample values for an EOI) • m – the number of samples that match the EOI’s value • Achieving the goals: • Capture the anomaly EOIs: m↓ →P↑ • Weed out the operational states: c↑ →P↓ CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Outline 1. Authors and Background 2. Motivation, Goals and Assumptions 3. PeerPressure Troubleshooting 4. Evaluation 5. Related Work 6. Concluding Remarks 7. Shortcomings and Future Work CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Evaluation Methodology • A PeerPressure prototype • different representation of the same value -- data sanitization. • Take 20 real-world misconfiguration problems • from MS PSS e-mail logs, MS Helpdesk and web support forums • the criterion is easy to reproduce • Reproduce these failure on real-usage machines • “clean” machines do NOT work • use “Control Panel” to inject these errors and re-run the corresponding applications • Draw 87 real-usage windows XP Registry snapshots • don’t get the snapshots right after you install the Windows CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Evaluation Result – Effectiveness • Initially, the number of EOIs ranges from 8 to 26,308 with a median of 1,171. • Now the number of EOIs is narrowed down to 1--16, by 3 order’s magnitude. • In most of cases (12 in 19), the root cause is found. CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Evaluation Result – Efficiency • The testing machine is a 2.4GHz CPU workstation with 1GB RAM. • The majority of time is consumed by sequential database queries. CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Evaluation Analysis (1) -- Source of False Positives Why are there many noisy entries? • A nature of bad entries • the bad entry has a large number of cardinality • e.g. Case 20 (Mediaplayer in IE): rank = 9, c = 65 • How unique other EOIs are • A highly customized machine may produce more noise • e.g. Case 11: rank=2, Case 12: rank=3, Case 16: rank=2 • The database is not pristine • Some example machines in the database have the same problem • e.g. Case 2: rank=16, Case 6: rank=12, Case 10: rank=2 CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Evaluation Analysis (2) -- Impact of # of Examples The more examples, the better accuracy? • Experiment set up • For each example set size N=5, 10, 20, 30, 50, 87, choose N examples from the database for 5 times and average the results. • Large example set doesn’t necessarily result in better accuracy. • strong conformity doesn’t depend on the example set size • operational state doesn’t depend on the example set size • large number of examples will help only when it is non-pristine CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Evaluation Analysis (3) -- Machine Sensitivity Does PeerPressure depend on the sick machine? • Experiment set up • Pick 3 real-usage machines from 3 different users. • Test 5 cases. • Mostly consistent CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Outline 1. Authors and Background 2. Motivation, Goals and Assumptions 3. PeerPressure Troubleshooting 4. Evaluation 5. Related Work 6. Concluding Remarks 7. Shortcomings and Future Work CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Related Work • White-box methods • idea: developers or administrators make rules and tools to restrict or monitoring the system behavior • pros: completely implemented white-box – the world is perfect!! • cons: accuracy and completeness • Black-box methods • idea: use only inputs and outputs to find the problems. The input maybe injected errors, re-running the applications and so on. The output maybe malfunctioning symptoms, configuration states and so forth. • pros: no need to know detailed knowledge of every single application; flexible and general; the test specification can be designed as soon as the problem happens • cons: suspects are huge CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Related Work (cont’d) • STRIDER • Other black-box methods • Probe creation and VM testing for identifying the healthy [Whitaker ’04] • PinPoint: component based system debugging, correlate execution path of components to pinpoint the problematic component [Chen ’02] • Hardware, software component dependencies [Brown ’01] • Networking diagnosis with pings and traceoutes [Brodie ’02] CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Outline 1. Authors and Background 2. Motivation, Goals and Assumptions 3. PeerPressure Troubleshooting 4. Evaluation 5. Related Work 6. Concluding Remarks 7. Shortcomings and Future Work CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Concluding Remarks • Automatic diagnosis of misconfiguration is possible! • use statistic methods to extract the healthy states from masses to eliminate the manual work on identification CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Outline 1. Authors and Background 2. Motivation, Goals and Assumptions 3. PeerPressure Troubleshooting 4. Evaluation 5. Related Work 6. Concluding Remarks 7. Shortcomings and Future Work CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Shortcomings and Future Work • Multi-entry problem • shortcoming: current PeerPressure can only handle single sick entry • to do: investigate multi-entry problems, and also the dependency among entries • Cross-application problem • shortcoming: sometimes it is hard to re-execute all the related applications for cross-application problems • to do: how to automatically find the superset of related executions • Handle rare applications • shortcoming: as the paper mentioned, one of the 20 selected problems, Yahoo Tool bar problem, can not be solved because in the GeneBank, only two machine have this application and they happened to be sick as well. • to do: effectively leverage p-2-p troubleshooting comunity CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) Shortcomings and Future Work (cont’d) • Handle highly specific machines • shortcoming: for some technology, the machines should be heavily customized, so there should be a huge number of false positive • to do: modify the model? give up? • GeneBank maintenance • Exploit p-2-p troubleshooting community • Privacy issues • shortcoming: some of the configuration data is related to personal privacy or computer security • to do: Friends Troubleshooting Network: Towards Privacy-Preserving, Automatic Troubleshooting [Wang ’05] CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)
CS598YYZ (Fall 2005) THE END Thank you! CS598YYZ(Fall 05) Paper Presentation – PeerPressure (10/18/2005) Xiao Ma (xiaoma2@cs.uiuc.edu)