1 / 23

Configuring Debugging as Search: Finding the Needle in the Haystack

Configuring Debugging as Search: Finding the Needle in the Haystack. Andrew Whitaker, Richard S. Cox and Steven D. Gribble. University of Washington Presented by Aditya Y.S.V. What does the paper talk about?.

Download Presentation

Configuring Debugging as Search: Finding the Needle in the Haystack

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Configuring Debugging as Search: Finding the Needle in the Haystack Andrew Whitaker, Richard S. Cox and Steven D. Gribble. University of Washington Presented by Aditya Y.S.V.

  2. What does the paper talk about? • This work addresses the problem of diagnosing configuration errors that cause a system to function incorrectly. • The basic idea is to search for the time when the system transitioned to a failed state. • The paper presents a tool CHRONUS which automates this.

  3. Motivation 1970’s Total ownership cost breakdown Hardware costs 2000’s People costs

  4. Existing Approaches • Prevention: never known to work for anything • Recovery: Windows XP restore. The problem with this is that it is a transition in itself and so it isn’t always safe. • Expert Systems: “Static Database” of known error configurations. Correction from this can be automated. Better example than the one given in paper is an Intrusion Detection System.

  5. Chronus External analysis tools When? The Basic Approach System failure Why?

  6. failure transition System Overview • Chronus reveals when a system failed • Chronus pro-actively logs system states Time system was working system was NOT working

  7. transition system was working system was NOT working Problem Formulation • Requirements: time travel, testing, search Time

  8. System Overview Design components Design choices

  9. Time Travel • Persistent vs. Transient state captures • Chronus :- Only persistent storage. • Application layer restarts are not useful where configurations outside the application(like in the OS) also play a role in its working.

  10. Storage layer Trade off RDMS CVS Semantics File System Disk Completeness

  11. Time-travel Disk Overhead

  12. Virtual Machines • The various states are checked by doing a virtual reboot of the system. • Virtual reboot is faster than physical reboot • Good way for terminating failed tests. Potentially be able to check more than one state at a time. (they don’t do this in the paper)

  13. Disadvantages of VM • Performance Overhead • May not be able to expose the latest devices and device drivers • Cannot diagnose errors within the virtualization layer itself such as updates to physical device driver.

  14. Testing • Automated diagnosis uses a user supplied “software probe”. • It has a manual method of software probe if all you remember is a series of GUI actions • There exist “non-deterministic” errors, and they cannot be reproduced.

  15. Search • Binary search • Spurious Errors • Strategy to overcome spurious errors.

  16. Phase #1: Normal operation • Child VM runs normal user programs • Parent VM records disk writes to a time-travel disk • Each block write represents an instant in time Time-travel disk disk requests Parent Virtual Machine Child Virtual Machine Denali Virtual Machine Monitor

  17. Was the system correct? Disk Time-travel Disk (Tbegin) probe disk requests Child Virtual Machine Phase #2: Debug Mode User command: search Tbegin Tend Parent Virtual Machine Denali Virtual Machine Monitor

  18. “Why?” • Chronus only tells you WHEN and not why a system failed. • For answering why, we need to have other tools. • Unix “diff” is mentioned as one of them.

  19. Case Study: Mozilla Web Browser • Mozilla Web Browser on the NetBSD OS • Methodology: install several extensions • Symptom: Mozilla freezes on startup • Fails to respond to user input

  20. #!/bin/sh mozilla & sleep 5 mozilla -remote ping() echo ‘SUCCESS’ > /TTOUTPUT blocks if Mozilla hangs Debugging the Mozilla Hang • Step 1: write a probe that tests the behavior:

  21. % search -begin 169354 -end 180025 169354: SUCCESS 180025: FAILURE 174689: FAILURE 172021: SUCCESS 173355: SUCCESS 174022: FAILURE 173688: FAILURE 173521: SUCCESS 173604: FAILURE 173562: FAILURE 173541: SUCCESS 173551: SUCCESS 173556: FAILURE 173553: FAILURE 173552: SUCCESS Mozilla Hang …….. Step 2: invoke search over a time range:

  22. Mozilla Hang ……. • Step 3: compute the change: % attach time-travel-disk 173552 173553 % diff -r /before /after file /.mozilla/default/zc1irw5u.slt/chrome/chrome.rdf differs: <RDF:Description about="urn:mozilla:package:stockticker” ... c:author="Jeremy Gillick" c:authorURL="http://jgillick.nettripper.com/" c:description="Shows your favorite stocks in a customized ticker." c:displayName="StockTicker 0.4.2”

  23. THANK YOU

More Related