1 / 12

Scaling for the Future

Scaling for the Future. Katherine Yelick U.C. Berkeley, EECS http://iram.cs.berkeley.edu/{istore} http://www.cs.berkeley.edu/projects/titanium. Two Independent Problems. Building a reliable, scalable infrastructure Scalable processor, cluster, and wide-area systems

bolding
Download Presentation

Scaling for the Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling for the Future Katherine Yelick U.C. Berkeley, EECS http://iram.cs.berkeley.edu/{istore} http://www.cs.berkeley.edu/projects/titanium

  2. Two Independent Problems • Building a reliable, scalable infrastructure • Scalable processor, cluster, and wide-area systems • IRAM, ISTORE, and OceanStore • One example application for the infrastructure • Microscale simulation of biological systems • Model signals from cell membrane to nucleus • Understanding disease and for pharmacological and BioMEMS-mediated therapy

  3. L o g i c f a b Proc $ $ L2$ Bus Bus D R A M I/O I/O I/O I/O Proc f a b D R A M Bus D R A M IRAM: Scaling within a Chip Microprocessor & DRAM on a single chip: • Avoids memory bus bottleneck • Address power limits by spreading logic over chip VIRAM chip: • Vector architecture • exploits bandwidth • preserves power & area advantages • Support for multimedia • IBM will fabricate Sp ’01 • 200 MHz, 3.2 Gflops, 2 W • .18 um mixed logic/DRAM

  4. ISTORE: Scaling Clusters • Design points • 2001: 80 nodes in 3 racks • 2002: 1000 nodes in 10 racks (?) • 2005: 10K nodes in 1 rack (?) • Add IRAM to 1” disk • Key problems are availability, maintainability, and evolutionary growth (AME) of a thousand node servers • Approach • Hardware built for availability: monitor, diagnostics • New class of benchmarks for AME • Reliable systems from unreliable hw/sw components • Introspection: the system watches itself

  5. OceanStore: Scaling to Utilities • Transparent data service provided by federation of companies: • Monthly fee paid to one service provider • Companies buy and sell capacity from each other • Assumptions: • Untrusted Infrastructure:only ciphertext in the infrastructure • Promiscuous Caching:cache anywhere, anytime • Optimistic Concurrency Control: avoid locking Canadian OceanStore Sprint AT&T IBM Pac Bell IBM

  6. The Real Scalability Problems: AME • Availability • systems should continue to meet quality of service goals despite failures and extreme load • Maintainability • minimize human administration • Evolutionary Growth • graceful evolution; dynamic scalability • These are problems for computation and storage services

  7. Research Principles • Redundancy everywhere • Hardware: processors, networks, disks,… • Software: language, libraries, runtime,… • Introspection • reactive techniques to detect and adapt to failures, workload variations, and system evolution • proactive techniques to anticipate and avert problems before they happen • Benchmarking • Define quantitative AME measures • Benchmarks drive the field

  8. Benchmarks • Availability benchmarks • Measure QoS as fault events occur • Support for fault injection key • Example of software RAID system • Maintainability benchmarks • Human factor is a challenge • Evolutionary growth benchmarks • Performance with heterogeneous hardware

  9. Example: Faults in Software RAID • Compares Linux and Solaris reconstruction • Linux:minimal performance impact but longer window of vulnerability to second fault • Solaris: large perf. impact but restores redundancy fast Linux Solaris

  10. Simulating Microscale Biological Systems • Large scale simulation useful for • Fundamental biological questions: cell behavior • Design of treatments, including Bio-MEMs • Simulations limited in part by • Machine complexity, e.g., memory hierarchies • Algorithmic complexity, e.g., adaptation • Old software model: • Hide the machine from the users • Implicit parallelism, hardware-controlled caching, • Results were unusable • Witness success of MPI

  11. New Model for Scalable High Confidence Computing • Domain-specific language that judiciously exposes machine structure • Explicit parallelism, load balancing and locality control • Allows for construction of complex, distributed data structures • Current • Demonstration on higher level models • Heart simulation • Future plans • Algorithms and software that adapts to faults • Microscale systems

  12. Conclusions • Scaling at all levels • Processors, clusters, wide area • Application challenges • Both storage and compute intensive • Key challenges to future infrastructure are: • Availability and reliability • Complexity of the machine

More Related