80 likes | 89 Views
This update provides an overview of Recovery-Oriented Computing (ROC) philosophy, principles, major areas, recent publications, measurements, tools, and evangelism. It also includes information about ROC retreats, the schedule, and breakout sessions.
E N D
Recovery Oriented Computing:Update Armando Fox (in loco Patterson)Summer ROC Retreat, June 2002
Welcome and ROC Philosophy • ROC philosophy (“Peres’s Law”): “If a problem has no solution, it may not be a problem, but a fact; not to be solved, but to be coped with over time”Israeli foreign minister Shimon Peres • Failures (hardware, software, operator-induced) are a fact; recovery is how we cope with them over time • Availability = MTTF/MTBF= MTTF / (MTTF + MTTR) - rather than just making MTTF very large, make MTTR << MTTF • ROC Principles • Isolation and partitionability => redundancy • Enable fault injection, output checking => online monitoring & verification • Undo support • Diagnostic support
Major ROC Areas • Failure detection and diagnosis • Pinpoint • FIG • Internet service failure causes • Recovery techniques and Design-for-Recovery • Recursive Restartability • Making state-management tradeoffs explicit (QAPSL) • Firm state from infirm components (RAINS) • Designing for Undo: theory and practice • Benchmarking and measurement • Dependability benchmarks for various applications • End-user availability measurements on the Web • Why Internet services fail • Estimating the cost of downtime • Availability in the PSTN
Recent Publications ROC Techniques and Tools: • A Utility-Centered Approach to Internet Services Design.George Candea, Armando Fox, in SIGOPS European Workshop • FIG: A prototype Tool for Online Verification of Recovery Mechanisms. P. Broadwell, N.Sastry, J.Traupman, D.Patterson, in SHAMAN workshop at ICS 2002 • Rewind, repair, replay: 3 R’s to Dependability. A. Brown and D. Patterson, SIGOPS European Workshop • Including the Human Factor in Dependability Benchmarks. A. Brown, L. Chung, D. Patterson. In DSN 2002 Workshop on Dependability Benchmarking. ROC Measurements: • Architecture, operation, and dependability of large-scale Internet services: three case studies. D. Oppenheimer and D.A. Patterson. Submission to IEEE Internet Computing special issue on Global Deployment of Data Centers, February 2002. (Shorter version in SIGOPS European Workshop) • Measuring End-User Availability on the Web: Practical Experience. Matthew Merzbacher and Dan Patterson. • Lessons from the PSTN for Dependable Computing. P.Enriquez, A.Brown, D.Patterson, in SHAMAN workshop at ICS 2002. Fault monitoring/diagnosis: • An Online Evolutionary Approach to Internet Services.E. Kiciman, M. Chen, E. Brewer. In SIGOPS European Workshop
Recent Evangelism • Evangelism publications • “Case for ROC” Technical Report • Introduction to Dependability (;login) • A Simple Way to Measure Cost of Downtime (LISA 02) • Evangelism talks • Microsoft Research • HPCA 02 keynote (Patterson) • FAST keynote (Filesystems And Storage Technologies) • IBM Autonomic Computing workshops (Almaden & TJ Watson)
About ROC Retreats • Purpose of semi-annual retreats • Progress reports/talks from academia and industry • Exposure/feedback on new ideas or work in progress • Brainstorming in immersive atmosphere • Industry/visitor feedback, opportunities for collaboration • Water fights during rafting trip • Logistics • Web server with retreat talks/papers - thanks to Mike Howard and Bob Miller - http://172.16.10.43/roc, WaveLAN “ANY”
Retreat Schedule - a work in progress • Rest of today • OceanStore update from Kubi • Intros • ROC talks • All day: Posters (especially right before & after dinner) • Tomorrow • Morning: OceanStore talks • Afternoon: Lunch and rafting • Post-rafting: breakout sessions followed by dinner • Breakout reporting/joint panel session with SAHARA • Wednesday • Industry talk(s) • “Open mike”/outrageous ideas session? • Visitor feedback
Breakout Sessions • Target: 3-4 breakouts • Using virtual machine technology for ROC • Ideas for the second ROC showcase application • Applying ROC to OceanStore • Management and Self-healing of large-scale systems • Is >100 year storage a pipe dream? • Other topics solicited • Final breakout topics will be decided based on interest in each topic and limiting each group size