1 / 15

Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000

Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000. David E. Culler Computer Science Division University of California, Berkeley http://www.cs.berkeley.edu/~culler. UCB Millennium Cluster of Clusters. x86+Myrinet platforms w/ GbE inter-networking.

vcleveland
Download Presentation

Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing a Cluster Strategy for NPACIAll Hands Meeting PanelFeb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley http://www.cs.berkeley.edu/~culler

  2. UCB Millennium Cluster of Clusters • x86+Myrinet platforms w/ GbE inter-networking NTONInternet-2SuperNet PIII-X 64x4 Ninja PIII 32x2 ½ TB PII PIII DLIB Gigabit Ethernet (GbE) PII8x2 PII8x2 Astro PII8x2 Math Physics PII8x2 PII8x2 Bio CE Mobile SvcsKiosks NOW Distributed ownership, allocation, and management NPACI Clusters

  3. Vineyard Cluster Architecture • Distributed resource utilization and management in a “Vineyard” of Clusters. Applications / Services(ISPACE/Kiosks) Mgmt / Monitoring PBS I/O MPI VEXEC TOOLS REXEC - VIA / GM, GbE - Multicast - NT / Linux (2.2.x) - Stride Scheduler Rootstock Distribution NPACI Clusters

  4. Clusters “own” HPC NPACI Clusters

  5. Fundamental Advantages of Clusters • Cost • Performance • Performance / Cost • Track leading edge of market technology • Incremental scalability • Availability • Tremendous I/O performance • Wide-Area Network performance • competitive internal network performance too • Allow specialization of networked services NPACI Clusters

  6. Fundamental Challenges • Management • Complete system on every node • need scalable administration • Incremental scalability & availability => • heterogeneity • some parts inoperable at any time • The Cluster projects are making great progress in this area • eg: Millennium rootstock • Cluster tools are what you want for managing the desktops across your department NPACI Clusters

  7. CS&E HPC hampered by “self-centered” usage model • Have my own application for my studies • Want the entire machine to myself • Want it now • Think “services” • Think “software” • The value is in your application. • Make it a service and make it available to the scientific community. • Put it on a cluster to deliver results 24x7 x 52 NPACI Clusters

  8. Example: TCAD Simulation Service • star formation simulation • earthquake simulations • phylogeny, BLAST, ... http://cuervo.eecs.berkeley.edu/Volcano/ NPACI Clusters

  9. Extreme Example • UCB Millennium / NOW has deliver 70 CPU years! • Simple special case, but ... • Engineered for portability, adaptability, availability NPACI Clusters

  10. What should NPACI do? To be relevant: • become a “Center of Expertise” for clusters • draw expertise toward the center for ease of dissemination • facilitate and encourage building clusters among the partners • invest in an interesting cluster “close to home” • cheap! Graft Millennium • invest in people to understand the implications To Lead: • Pioneer widespread computational science and engineering services • infiniband NPACI Clusters

  11. from e-commerce to e-Science NPACI Clusters

  12. Technical Backup Slides NPACI Clusters

  13. 2. Make the CS “graft” - specify IP address - pckg removes - dchp, dns, nis,... sanity check and build - resolv.conf, /etc/hosts, ... constructs cluster build (lease) download CS build floppy 3. CS power-on build - xfer and localize DT - add local admin scripts - node build floppy Cluster leased builds K 4. Node power-on build - local stock from CS Rootstock Mechanics Cluster System Distribution Center cluster stock - build - os - drvrs - mill SW - os mods cs 1. Cluster Stock - Rootstock build pages - Full Current Linux - all fixes and pckgs - SSL, SSH - Cluster Drivers - Cluster System Layers - rexec, mpe, pbs - Optional SW ($) - Cluster Kernal Mods IP network CAN ... 5. Cluster Update button (future) - 2nd dialtone, CF engine, rolling update NPACI Clusters

  14. REXEC / VEXEC • Components • rexecd, rexec & vexecd Node A Node B Node C Node D rexecd rexecd rexecd rexecd Cluster IP Multicast Channel vexecd(Policy A) vexecd(Policy B) “Nodes AB” run indexer on Nodes AB at 3 credits/min minimum $ rexec %rexec –n 2 –r 3 indexer NPACI Clusters

  15. Computational Economy • Market-based approach to resource allocation • Optimizes for user value TimeShare API API BatchQueue Economic F.E. Access Modules Resources Apps(Value) Resource Managers NPACI Clusters

More Related