1 / 15

Berkeley RAD Lab Technical Overview

Berkeley RAD Lab Technical Overview. Armando Fox, Randy Katz, Michael Jordan, Dave Patterson, Scott Shenker, Ion Stoica March 2006. RAD Lab. The 5-year Vision : Single person can go from vision to a next-generation IT service (“the Fortune 1 million”)

alkire
Download Presentation

Berkeley RAD Lab Technical Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Berkeley RAD LabTechnical Overview Armando Fox, Randy Katz, Michael Jordan, Dave Patterson, Scott Shenker, Ion Stoica March 2006

  2. RAD Lab The 5-year Vision: Single person can go from vision to a next-generation IT service (“the Fortune 1 million”) • E.g., over long holiday weekend in 1995, Pierre Omidyar created Ebay v1.0 The Challenges: • Develop the new Service: today, easy prototyping ≠ easy operations • Assess: Measuring, Testing, and Debugging the new Service in a realistic distributed environment: how will it scale? • Deploy: Scaling up a new, geographically distributed Service • Operate a service that could quickly scale to millions of users with <1 operator The Vehicle: Interdisciplinary Center creates core technical competency to demo 10X to 100X • Researchers are leaders in machine learning, networking, and systems • Industrial Participants: leading companies in HW, systems SW, and online services • “RAD Lab” = Reliable, Adaptable, Distributed systems

  3. Founding the RAD Lab • Looked for 3 to 4 founding companies to fund 5 years @ cost of $0.5M / year • Google, Microsoft, Sun Microsystems signed up • Affiliate Companies ($0.1M/yr): HP, IBM, others • Founding Company Model • Prefer founding partner technology in prototypes • Designate employees to act as consultants • Putting IP in Public Domain • 3-year project review by founding partners • $2.5-$3M/yr ~65% industry, ~25% state, ~10% fed • 30 grad students + 10 undergrads+ 6 faculty + 2 staff

  4. Process: SupportDADO Evolution, 1 group Steps: Traditional, Static Handoff Model, N groups Assess Deploy Assess Deploy Develop Operate Develop Operate Steps vs. Process

  5. Key Ingredients: Visualization &Statistical Machine Learning (SML) • Too much data for human to troubleshoot manually • Eg Amazon - tens of metrics, 100’s-1000’s of machines • Visualization exploits human visual processing • SML finds patterns in large quantities of data

  6. Operations example: combiningvisualization & machine learning • Idea: end-userbehavior as “failure detector” • Approach: combine visualization with SML analysis so operator see anomalies too • Experiment: does distribution of hits to various pages match the “historical” distribution? • Each minute, compare hit counts of top N pages to hit counts over last 6 hours using Bayesian networks and 2test, real Ebates data To learn more, see “Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization,” In Proc. 2nd IEEE Int’l Conf. on Autonomic Computing, June2005, by Peter Bodik, Greg Friedman, Lukas Biewald, Helen Levine (Ebates,com), George Candea, Kayur Patel, Gilman Tolle, Jon Hui, Armando Fox, Michael I. Jordan, David Patterson.

  7. Time (5 minute intervals) Top 40 Pages Visualization as user behavior completely different; usually animate architecture Win trust in SLT by leveraging operator expertise and human visual pattern recognition

  8. Build Academic MPP from FPGAs • As  25 CPUs will fit in Field Programmable Gate Array (FPGA), 1000-CPU system from  40 FPGAs? • 16 32-bit simple “soft core” RISC at 150MHz in 2004 (Virtex-II) • FPGA generations every 1.5 yrs;  2X CPUs,  1.2X clock rate • HW research community does logic design (“gate shareware”) to create out-of-the-box, MPP • E.g., 1000 processor, standard ISA binary-compatible, 64-bit, cache-coherent supercomputer @  100 MHz/CPU in 2007 • RAMPants: Arvind (MIT), Krste Asanovíc (MIT), Derek Chiou (Texas), James Hoe (CMU), Christos Kozyrakis (Stanford), Shih-Lien Lu (Intel), Mark Oskin (Washington), David Patterson (Berkeley, Co-PI), Jan Rabaey (Berkeley), and John Wawrzynek (Berkeley, PI) • “Research Accelerator for Multiple Processors”

  9. Why RAMP Good for Research MPP?

  10. Box: 8 compute modules in 8U rack mount chassis 1000 CPUs :1.5 KW, ¼ rack,  $100,000 RAMP 1 Hardware • Completed Dec. 2004 (14x17 inch 22-layer PCB) 1.5W / computer, 5 cu. in. /computer, $100 / computer Board: 5 Virtex II FPGAs, 18 banks DDR2-400 memory, 20 10GigE conn. BEE2: Berkeley Emulation Engine 2 By John Wawrzynek and Bob Brodersen with students Chen Chang and Pierre Droz

  11. RAMP in RADS: Internet in a Box • Building blocks also  Distributed Computing • RAMP vs. Clusters (Emulab, PlanetLab) • Scale: RAMP O(1000) vs. Clusters O(100) • Private use: $100k  Every group has one • Develop/Debug: Reproducibility, Observability • Flexibility: Modify modules (Router, SMP, OS) • Explore via repeatable experiments as vary parameters, configurations vs. observations on single (aging) cluster that is often idiosyncratic

  12. Planned Apps & Courses • ResearchIndex: reputation & ranking system for CS research papers and digests • Seeking suggestions/collaboration on this & other possible apps, to get experience with Develop & Deploy • Seeing datasets corresponding to larger (real) apps as well, to increase experience with Assess & Operate • Courses • CS 294, Fall 06: MS/PhD level projects contributing to RAD Lab infrastructure in all areas (DADO) • CS 294, Fall 07: Prototype services to run in “production mode” on RAD Lab platform, improve platform/environment based on lessons from deployment • CS 294, Fall 08: “Web 2.0” style services on RAD Lab platform (e.g. joint with Haas Business School) • Undergrad courses, >2008: software eng. assignments are network services running on RADS platform

  13. Capability (Desired): 1 person can invent & run the next-gen IT service Develop using primitives to enable functions (MapReduce), services (Craigslist) Assess using deterministic replay and statistical debugging Deploy via “Internet-in-a-Box” FPGAs Operate SLT-friendly, Control Theory-friendly architectures and operator-centric visualization and analysis tools Base Technology: Server Hardware, System Software, Middleware, Networking RAD Lab: Interdisciplinary Center for Reliable, Adaptive, Distributed Systems

  14. Industrial collaboration • Historically a UCB strength • Industrial research labs are ideal partners • High quality research staff => symmetric collaboration • Ties to product groups => work on relevant problems • Access to real data sets => realistic evaluation of prototypes • Goal: ongoing transfer of software, technology & people • “BSD License” for RAD Lab technology intended to ease adoption by industrial partners • RADLab targets: SML & control theory, visualization, development of service-oriented archs. & apps.

  15. RAD Lab Timeline • 2005 Launch RAD Lab • 2006 Collect workloads, Internet in a Box • 2007 SLT/CT distributed architectures, Iboxes, annotation layer, class testing • 2008 Development toolkit 1.0, tuple space, class testing; Mid Project Review • 2009 RAD Lab software suite 1.0, class testing • 2010 End of Project Party

More Related