70 likes | 191 Views
Welcome to the Winter 2004 ROC Retreat. Armando Fox and David Patterson. About ROC Retreats. Purpose of semi-annual retreats Progress reports/talks from academia and industry Exposure/feedback on new ideas or work in progress Brainstorming in immersive atmosphere
E N D
Welcome to the Winter 2004 ROC Retreat Armando Fox and David Patterson
About ROC Retreats • Purpose of semi-annual retreats • Progress reports/talks from academia and industry • Exposure/feedback on new ideas or work in progress • Brainstorming in immersive atmosphere • Industry/visitor feedback, opportunities for collaboration • Skiing • Logistics • Web server with retreat talks/papers - thanks to Mike Howard and Bob Miller • Skiing
ROC Events • Aaron Brown, UC Berkeley=> Dr. Aaron Brown, IBM Research • Pete Broadwell, UC Berkeley=> Pete Broadwell, M.S., ??? • Soon: Mike Chen, UC Berkeley=> Dr. Mike Chen, ??? • ROC work recognized in the 2003 Scientific American 50
Recent Publications (since June 2003) Published or to appear: • Ben Ling, Emre Kiciman, Armando Fox: Session State: Beyond Soft State, in NSDI 2004 • Mike Chen, Anthony Accardi, Emre Kiciman, Jim Lloyd, Eric Brewer, Armando Fox: Path-Based Failure and Evolution Management, in NSDI 2004 • George Candea, Steve Zhang, Emre Kiciman, Armando Fox, Application-Generic Recovery for Internet Middleware, Cluster Computing Journal (special issue on Autonomic Computing), summer 2004 • George Candea, James Cutler, Armando Fox, Improving Availability with Recursive Microreboots: A Soft-State System Case Study, Performance Evaluation Journal, 56(1-3), March 2004 In submission: • George Candea and Armando Fox, Microreboots: An Application-Generic Recovery Technique for Internet Services, submitted to USENIX 2004 • Andy Huang and Armando Fox, Free Recovery: A Step Towards Self-Managing State, submitted to USENIX 2004 • Emre Kiciman and Armando Fox, Detecting and Localizing Anomalous Behavior to Discover Failures in Component-Based Internet Services, submitted to USENIX 2004 • Yee-Jiun Song, Jeff Raymakers, Wendy Tobagus, Armando Fox. Is MTTR More Important Than MTTF For User-Perceived Availability?, submitted to DSN-IPDS 2004
Preview of some upcoming talks • Benchmarking • Evaluating undo: human-aware recovery benchmarks • Benchmarking distributed services • Including latency & data quality in performability evaluation of a web-based service • Making recovery nearly free • Evaluating the effect of micro-reboots on end users • How cheap recovery simplifies persistent state management • Embracing statistical analysis • Using statistical learning to detect and localize faults in componentized Internet services • A statistical learning approach to failure diagnosis for eBay • Toward generalized API’s for statistical monitoring
ROC => RADS • Generalize ROC approaches that focus on statistical anomaly detection as a way of detecting conditions that require response • Generalize “recovery” to “adaptation” • System is “always recovering”/”always adapting” • Some early examples of this will be featured in talks • Insight: statistical pattern recognition provides a degree of application-generic failure detection; nearly-free recovery means we can tolerate some false positives • Kickoff panel this evening
Other Highlights • Poster advertisements before poster session • Three talks from industrial visitors • Moises Goldszmidt: statistical pattern recognition applied to systems management • Chris Overton: modeling large-scale IT systems • Paul Brett: Real-world failures, a systemic view