150 likes | 164 Views
Developing a Cluster Strategy for NPACI All Hands Meeting Panel Feb 11, 2000. David E. Culler Computer Science Division University of California, Berkeley http://www.cs.berkeley.edu/~culler. UCB Millennium Cluster of Clusters. x86+Myrinet platforms w/ GbE inter-networking.
E N D
Developing a Cluster Strategy for NPACIAll Hands Meeting PanelFeb 11, 2000 David E. Culler Computer Science Division University of California, Berkeley http://www.cs.berkeley.edu/~culler
UCB Millennium Cluster of Clusters • x86+Myrinet platforms w/ GbE inter-networking NTONInternet-2SuperNet PIII-X 64x4 Ninja PIII 32x2 ½ TB PII PIII DLIB Gigabit Ethernet (GbE) PII8x2 PII8x2 Astro PII8x2 Math Physics PII8x2 PII8x2 Bio CE Mobile SvcsKiosks NOW Distributed ownership, allocation, and management NPACI Clusters
Vineyard Cluster Architecture • Distributed resource utilization and management in a “Vineyard” of Clusters. Applications / Services(ISPACE/Kiosks) Mgmt / Monitoring PBS I/O MPI VEXEC TOOLS REXEC - VIA / GM, GbE - Multicast - NT / Linux (2.2.x) - Stride Scheduler Rootstock Distribution NPACI Clusters
Clusters “own” HPC NPACI Clusters
Fundamental Advantages of Clusters • Cost • Performance • Performance / Cost • Track leading edge of market technology • Incremental scalability • Availability • Tremendous I/O performance • Wide-Area Network performance • competitive internal network performance too • Allow specialization of networked services NPACI Clusters
Fundamental Challenges • Management • Complete system on every node • need scalable administration • Incremental scalability & availability => • heterogeneity • some parts inoperable at any time • The Cluster projects are making great progress in this area • eg: Millennium rootstock • Cluster tools are what you want for managing the desktops across your department NPACI Clusters
CS&E HPC hampered by “self-centered” usage model • Have my own application for my studies • Want the entire machine to myself • Want it now • Think “services” • Think “software” • The value is in your application. • Make it a service and make it available to the scientific community. • Put it on a cluster to deliver results 24x7 x 52 NPACI Clusters
Example: TCAD Simulation Service • star formation simulation • earthquake simulations • phylogeny, BLAST, ... http://cuervo.eecs.berkeley.edu/Volcano/ NPACI Clusters
Extreme Example • UCB Millennium / NOW has deliver 70 CPU years! • Simple special case, but ... • Engineered for portability, adaptability, availability NPACI Clusters
What should NPACI do? To be relevant: • become a “Center of Expertise” for clusters • draw expertise toward the center for ease of dissemination • facilitate and encourage building clusters among the partners • invest in an interesting cluster “close to home” • cheap! Graft Millennium • invest in people to understand the implications To Lead: • Pioneer widespread computational science and engineering services • infiniband NPACI Clusters
from e-commerce to e-Science NPACI Clusters
Technical Backup Slides NPACI Clusters
2. Make the CS “graft” - specify IP address - pckg removes - dchp, dns, nis,... sanity check and build - resolv.conf, /etc/hosts, ... constructs cluster build (lease) download CS build floppy 3. CS power-on build - xfer and localize DT - add local admin scripts - node build floppy Cluster leased builds K 4. Node power-on build - local stock from CS Rootstock Mechanics Cluster System Distribution Center cluster stock - build - os - drvrs - mill SW - os mods cs 1. Cluster Stock - Rootstock build pages - Full Current Linux - all fixes and pckgs - SSL, SSH - Cluster Drivers - Cluster System Layers - rexec, mpe, pbs - Optional SW ($) - Cluster Kernal Mods IP network CAN ... 5. Cluster Update button (future) - 2nd dialtone, CF engine, rolling update NPACI Clusters
REXEC / VEXEC • Components • rexecd, rexec & vexecd Node A Node B Node C Node D rexecd rexecd rexecd rexecd Cluster IP Multicast Channel vexecd(Policy A) vexecd(Policy B) “Nodes AB” run indexer on Nodes AB at 3 credits/min minimum $ rexec %rexec –n 2 –r 3 indexer NPACI Clusters
Computational Economy • Market-based approach to resource allocation • Optimizes for user value TimeShare API API BatchQueue Economic F.E. Access Modules Resources Apps(Value) Resource Managers NPACI Clusters