1 / 34

Computers for the Post-PC Era

Computers for the Post-PC Era. Aaron Brown, Jim Beck, Rich Martin, David Oppenheimer, Kathy Yelick, and David Patterson http://iram.cs.berkeley.edu/istore 2000 Grad Visit Day. Berkeley Approach to Systems.

vramirez
Download Presentation

Computers for the Post-PC Era

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computers for the Post-PC Era Aaron Brown, Jim Beck, Rich Martin, David Oppenheimer, Kathy Yelick, and David Patterson http://iram.cs.berkeley.edu/istore 2000 Grad Visit Day

  2. Berkeley Approach to Systems • Find an important problem crossing HW/SW Interface, with HW/SW prototype at end, typically as part of graduate courses • Assemble a band of 3-6 faculty, 12-20 grad students, 1-3 staff to tackle it over 4 years • Meet twice a year for 3-day retreats with invited outsiders • Builds team spirit • Get advice on direction, and change course • Offers milestones for project stages • Grad students give 6 to 8 talks  Great Speakers • Write papers, go to conferences, get PhDs, jobs • End of project party, reshuffle faculty, go to 1

  3. For Example, Projects I Have Worked On • RISC I,II • Sequin, Ousterhout (CAD) • SOAR (Smalltalk On A RISC) Ousterhout (CAD) • SPUR (Symbolic Processing Using RISCs) • Fateman, Hilfinger, Hodges, Katz, Ousterhout • RAID I,II (Redundant Array of Inexp. Disks) • Katz, Ousterhout, Stonebraker • NOW I,II (Network of Workstations), (TD) • Culler, Anderson • IRAM I (Intelligent RAM) • Yelick, Kubiatowicz, Wawrzynek • ISTORE I,II (Intelligent Storage) • Yelick, Kubiatowicz

  4. Symbolic Processing Using RISCs: ‘85-’89 • Before Commercial RISC chips • Built Workstation Multiprocessor and Operating System from scratch(!) • Sprite Operating System • 3 chips: Processor, Cache Controller, FPU • Coined term “snopping cache protocol” • 3C’s cache miss: compulsory, capacity, conflict

  5. Group Photo (in souvenir jackets) Jim Larus, Wisconsin, M/S George Taylor, Founder, ? David Wood,Wisconsin Dave Lee Founder Si. Image John Ouster- hout Founder, Scriptics • See www.cs.berkeley.edu/Projects/ARC to learn more about Berkeley Systems Ben Zorn Colorado, M/S Mark Hill Wisc. Mendel Rosen- blum, Stanford, Founder VMware Susan Eggers Wash-ington Brent Welch Founder, Scriptics Shing Kong Transmeta Garth Gibson CMU, Founder ?

  6. SPUR 10 Year Reunion, January ‘99 • Everyone from North America came! • 19 PhDs: 9 to Academia • 8/9 got tenure, 2 full professors (already) • 2 Romme fellows (3rd, 4th at Wisconsin) • 3 NSF Presidential Young Investigator Winners • 2 ACM Dissertation Awards • They in turn produced 30 PhDs (1/99) • 10 to Industry • Founders of 5 startups, (1 failed) • 2 Department heads (AT&T Bell Labs, Microsoft) • Very successful group; SPUR Project “gave them a taste of success, lifelong friends”,

  7. Network of Workstations (NOW) ‘94 -’98 • Leveraging commodity workstations and OSes to harness the power of clustered machines connected via high-speed switched networks • Construction of HW/SW prototypes: NOW-1 with 32 SuperSPARCs, and NOW-2 with 100 UltraSPARC 1s • NOW-2 cluster held the world record for the fastest Disk-to-Disk Sort for 2 years, 1997-1999 • NOW-2 cluster 1st to crack the 40-bit key as part of a key-cracking challenge offered by RSA, 1997 • NOW-2 made list of Top 200 supercomputers 1997 • NOW a foundation of Virtual Interface (VI) Architecture, standard allows protected, direct user-level access to network, by Compaq, Intel, & M/S • NOW technology led directly to one Internet startup company (Inktomi), + many other Internet companies use cluster technology

  8. Network of Workstations (NOW) ‘94 -’98 • 12 PhDs. Note that 3/4 of them went into academia, and that 1/3 are female: • Andrea Arpaci-Desseau, Asst. Professor, Wisconsin, Madison • Remzi Arpaci-Desseau, Asst. Professor, Wisconsin, Madison • Mike Dahlin, Asst. Professor, University of Texas, Austin • Jeanna Neefe Matthews, Asst. Professor, Clarkson Univ. • Douglas Ghormley, Researcher, Los Alamos National Labs • Kim Keeton, Researcher, Hewlett Packard Labs • Steve Lumetta, Assistant Professor, Illinois • Alan Mainwaring, Researcher, Sun Microsystems Labs • Rich Martin, Assistant Professor, Rutgers University • Nisha Talagala, Researcher, Network Storage, Sun Micro. • Amin Vahdat, Assistant Professor, Duke University • Randy Wang, Assistant Professor, Princeton University

  9. Research in Berkeley Courses • RISC, SPUR, RAID, NOW, IRAM, ISTORE all started in advanced graduate courses • Make transition from undergraduate student to researcher in first-year graduate courses • First year architecture, operating systems courses: select topic, do research, write paper, give talk • Prof meets each team 1-on-1 ~3 times, + TA help • Some papers get submitted and published • Requires class size < 40 (e.g., Berkeley) • If 1st year course size ~100 students => cannot do research in grad courses 1st year or so • If school offers combined BS/MS (e.g., MIT) or professional MS via TV broadcast (e.g., Stanford), then effective class size ~150-250

  10. Outline • Background: Berkeley Approach to Systems • PostPC Motivation • PostPC Microprocessor: IRAM • PostPC Infrastructure Motivation • PostPC Infrastructure: ISTORE • Hardware Architecture • Software Architecture • Conclusions and Feedback

  11. Perspective on Post-PC Era • PostPC Era will be driven by 2 technologies: 1) “Gadgets”:Tiny Embedded or Mobile Devices • ubiquitous: in everything • e.g., successor to PDA, cell phone, wearable computers 2) Infrastructure to Support such Devices • e.g., successor to Big Fat Web Servers, Database Servers

  12. L o g i c f a b Proc $ $ L2$ Bus Bus D R A M I/O I/O I/O I/O Proc f a b D R A M Bus D R A M Intelligent RAM: IRAM Microprocessor & DRAM on a single chip: • 10X capacity vs. SRAM • on-chip memory latency 5-10X, bandwidth 50-100X • improve energy efficiency 2X-4X (no off-chip bus) • serial I/O 5-10X v. buses • smaller board area/volume IRAM advantages extend to: • a single chip system • a building block for larger systems

  13. Cost: $1M each? Low latency, high BW memory system? Code density? Compilers? Performance? Power/Energy? Limited to scientific applications? Single-chip CMOS MPU/IRAM IRAM Much smaller than VLIW For sale, mature (>20 years)(We retarget Cray compilers) Easy scale speed with technology Parallel to save energy, keep performance Multimedia apps vectorizable too: N*64b, 2N*32b, 4N*16b Revive Vector Architecture

  14. C P U+$ 4 Vector Pipes/Lanes VIRAM-1: System on a Chip • Prototype scheduled for end of Summer 2000 • 0.18 um EDL process • 16 MB DRAM, 8 banks • MIPS Scalar core and caches @ 200 MHz • 4 64-bit vector unit pipelines @ 200 MHz • 4 100 MB parallel I/O lines • 17x17 mm, 2 Watts • 25.6 GB/s memory (6.4 GB/s per direction and per Xbar) • 1.6 Gflops (64-bit), 6.4 GOPs (16-bit) • 140 M transistors (> Intel?) Memory(64 Mbits / 8 MBytes) Xbar I/O Memory(64 Mbits / 8 MBytes)

  15. Outline • PostPC Infrastructure Motivation and Background: Berkeley’s Past • PostPC Motivation • PostPC Device Microprocessor: IRAM • PostPC Infrastructure Motivation • ISTORE Goals • Hardware Architecture • Software Architecture • Conclusions and Feedback

  16. Background: Tertiary Disk (part of NOW) • Tertiary Disk (1997) • cluster of 20 PCs hosting 364 3.5” IBM disks (8.4 GB) in 7 19”x 33” x 84” racks, or 3 TB. The 200MHz, 96 MB P6 PCs run FreeBSD and a switched 100Mb/s Ethernet connects the hosts. Also 4 UPS units. • Hosts world’s largest art database:80,000 images in cooperation with San Francisco Fine Arts Museum:Try www.thinker.org

  17. Tertiary Disk HW Failure Experience Reliability of hardware components (20 months) 7 IBM SCSI disk failures (out of 364, or 2%) 6 IDE (internal) disk failures (out of 20, or 30%) 1 SCSI controller failure (out of 44, or 2%) 1 SCSI Cable (out of 39, or 3%) 1 Ethernet card failure (out of 20, or 5%) 1 Ethernet switch (out of 2, or 50%) 3 enclosure power supplies (out of 92, or 3%) 1 short power outage (covered by UPS) Did not match expectations:SCSI disks more reliable than SCSI cables! Difference between simulation and prototypes

  18. SCSI Time Outs+ Hardware Failures (m11) SCSI Bus 0

  19. Can we predict a disk failure? • Yes, look for Hardware Error messages • These messages lasted for 8 days between: • 8-17-98 and 8-25-98 • On disk 9 there were: • 1763 Hardware Error Messages, and • 297 SCSI Timed Out Messages • On 8-28-98: Disk 9 on SCSI Bus 0 of m11 was “fired”, i.e. appeared it was about to fail, so it was swapped

  20. Lessons from Tertiary Disk Project • Maintenance is hard on current systems • Hard to know what is going on, who is to blame • Everything can break • Its not what you expect in advance • Follow rule of no single point of failure • Nothing fails fast • Eventually behaves bad enough that operator “fires” poor performer, but it doesn’t “quit” • Most failures may be predicted

  21. Outline • Background: Berkeley Approach to Systems • PostPC Motivation • PostPC Microprocessor: IRAM • PostPC Infrastructure Motivation • PostPC Infrastructure: ISTORE • Hardware Architecture • Software Architecture • Conclusions and Feedback

  22. The problem space: big data • Big demand for enormous amounts of data • today: high-end enterprise and Internet applications • enterprise decision-support, data mining databases • online applications: e-commerce, mail, web, archives • future: infrastructure services, richer data • computational & storage back-ends for mobile devices • more multimedia content • more use of historical data to provide better services • Today’s SMP server designs can’t easily scale • Bigger scaling problems than performance!

  23. The real scalability problems: AME • Availability • systems should continue to meet quality of service goals despite hardware and software failures • Maintainability • systems should require only minimal ongoing human administration, regardless of scale or complexity • Evolutionary Growth • systems should evolve gracefully in terms of performance, maintainability, and availability as they are grown/upgraded/expanded • These are problems at today’s scales, and will only get worse as systems grow

  24. Principles for achieving AME (1) • No single points of failure • Redundancy everywhere • Performance robustness is more important than peak performance • “performance robustness” implies that real-world performance is comparable to best-case performance • Performance can be sacrificed for improvements in AME • resources should be dedicated to AME • compare: biological systems spend > 50% of resources on maintenance • can make up performance by scaling system

  25. Principles for achieving AME (2) • Introspection • reactive techniques to detect and adapt to failures, workload variations, and system evolution • proactive (preventative) techniques to anticipate and avert problems before they happen

  26. Hardware techniques (2) • No Central Processor Unit: distribute processing with storage • Serial lines, switches also growing with Moore’s Law; less need today to centralize vs. bus oriented systems • Most storage servers limited by speed of CPUs; why does this make sense? • Why not amortize sheet metal, power, cooling infrastructure for disk to add processor, memory, and network? • If AME is important, must provide resources to be used to help AME: local processors responsible for health and maintenance of their storage

  27. Disk Half-height canister ISTORE-1 hardware platform • 80-node x86-based cluster, 1.4TB storage • cluster nodes are plug-and-play, intelligent, network-attached storage “bricks” • a single field-replaceable unit to simplify maintenance • each node is a full x86 PC w/256MB DRAM, 18GB disk • more CPU than NAS; fewer disks/node than cluster Intelligent Disk “Brick” Portable PC CPU: Pentium II/266 + DRAM Redundant NICs (4 100 Mb/s links) Diagnostic Processor • ISTORE Chassis • 80 nodes, 8 per tray • 2 levels of switches • 20 100 Mbit/s • 2 1 Gbit/s • Environment Monitoring: • UPS, redundant PS, • fans, heat and vibration sensors...

  28. A glimpse into the future? • System-on-a-chip enables computer, memory, redundant network interfaces without significantly increasing size of disk • ISTORE HW in 5-7 years: • building block: 2006 MicroDrive integrated with IRAM • 9GB disk, 50 MB/sec from disk • connected via crossbar switch • 10,000 nodes fit into one rack! • O(10,000) scale is our ultimate design point

  29. Development techniques • Benchmarking • One reason for 1000X processor performance was ability to measure (vs. debate) which is better • e.g., Which most important to improve: clock rate, clocks per instruction, or instructions executed? • Need AME benchmarks “what gets measured gets done” “benchmarks shape a field” “quantification brings rigor”

  30. Example results: multiple-faults Windows 2000/IIS Linux/ Apache • Windows reconstructs ~3x faster than Linux • Windows reconstruction noticeably affects application performance, while Linux reconstruction does not

  31. Software techniques (1) • Proactive introspection • Continuous online self-testing of HW and SW • in deployed systems! • goal is to shake out “Heisenbugs” before they’re encountered in normal operation • needs data redundancy, node isolation, fault injection • Techniques: • fault injection: triggering hardware and software error handling paths to verify their integrity/existence • stress testing: push HW/SW to their limits • scrubbing: periodic restoration of potentially “decaying” hardware or software state • self-scrubbing data structures (like MVS) • ECC scrubbing for disks and memory

  32. Conclusions (1): ISTORE • Availability, Maintainability, and Evolutionary growth are key challenges for server systems • more important even than performance • ISTORE is investigating ways to bring AME to large-scale, storage-intensive servers • via clusters of network-attached, computationally-enhanced storage nodes running distributed code • via hardware and software introspection • we are currently performing application studies to investigate and compare techniques • Availability benchmarks a powerful tool? • revealed undocumented design decisions affecting SW RAID availability on Linux and Windows 2000

  33. Conclusions (2) • IRAM attractive for two Post-PC applications because of low power, small size, high memory bandwidth • Gadgets: Embedded/Mobile devices • Infrastructure: Intelligent Storage and Networks • PostPC infrastructure requires • New Goals: Availability, Maintainability, Evolution • New Principles: Introspection, Performance Robustness • New Techniques: Isolation/fault insertion, Software scrubbing • New Benchmarks: measure, compare AME metrics

  34. Berkeley Future work • IRAM: fab and test chip • ISTORE • implement AME-enhancing techniques in a variety of Internet, enterprise, and info retrieval applications • select the best techniques and integrate into a generic runtime system with “AME API” • add maintainability benchmarks • can we quantify administrative work needed to maintain a certain level of availability? • Perhaps look at data security via encryption? • Even consider denial of service?

More Related