Cluster Archtectures and the NPACI Berkeley NOW

Cluster Archtectures andthe NPACI Berkeley NOW David E. Culler Computer Science Division U.C. Berkeley http://now.cs.berkeley.edu

Architectural Drivers • Node architecture dominates performance • processor, cache, bus, and memory • design and engineering $ => performance • Greatest demand for performance is on large systems • must track the leading edge of technology without lag • MPP network technology => mainstream • system area networks • System on every node is a powerful enabler • very high speed I/O, virtual memory, schedulings, … • Incremental scalability (up, down, and across) • Complete software tools • Wide class of applications

Berkeley NOW • 100 Sun UltraSparcs • 200 disks • Myrinet SAN • 160 MB/s • Fast comm. • AM, MPI, ... • Ether/ATM switched external net • Global OS • Self Config

P P Basic Components MyriNet 160 MB/s Myricom NIC M M I/O bus $ Sun Ultra 170

Massive Cheap Storage Cluster • Basic unit: 2 PCs double-ending four SCSI chains of 8 disks each Currently serving Fine Art at http://www.thinker.org/imagebase/

Cluster of SMPs (CLUMPS) • Four Sun E5000s • 8 processors • 4 Myricom NICs each • Multiprocessor, Multi-NIC, Multi-Protocol • NPACI => Sun 450s

Millennium PC Clumps • Inexpensive, easy to manage Cluster • Replicated in many departments • Prototype for very large PC cluster

So What’s So Different? • Commodity parts? • Communications Packaging? • Incremental Scalability? • Independent Failure? • Intelligent Network Interfaces? • Complete System on every node • virtual memory • scheduler • files • ...

Communication Performance  Direct Network Access • LogP: Latency, Overhead, and Bandwidth • Active Messages: lean layer supporting programming models Latency 1/BW

MPI Performance

NAS Parallel Benchmarks

World-Record Disk-to-Disk Sort • Sustain 500 MB/s disk bandwidth and 1,000 MB/s network bandwidth

General purpose Parallel System • Many timeshared processes • each with direct, protected access • partition it any way you like • User and system • Client/Server, Parallel clients, parallel servers • they grow, shrink, handle node failures • Multiple packages in a process • each may have own internal communication layer • Use communication as easily as memory

Virtual Networks • Endpoint abstracts the notion of “attached to the network” • Virtual network is a collection of endpoints that can name each other. • Many processes on a node can each have many endpoints, each with own protection domain.

How are they managed? • How do you get direct hardware access for performance with a large space of logical resources? • Just like virtual memory • active portion of large logical space is bound to physical resources Host Memory Process n Processor *** Process 3 Process 2 Process 1 NIC Mem P Network Interface

Network Interface Support • NIC has endpoint frames • Services active endpoints • Signals misses to driver • using a system endpont Frame 0 Transmit Receive Frame 7 EndPoint Miss

Msg burst work Client Server Client Server Server Client Communication under Load

Beyond the Personal Supercomputer • Able to timeshare parallel programs • with fast, protected communication • Mix with sequential and interactive jobs • Use fast communication in OS subsystems • parallel file system, network virtual memory, … • Nodes have powerful, local OS scheduler • Simple implicit scheduling techniques provide coordinated scheduling => ride workstation/PC nodes and internet server systems technology => focus CS partners on RAS for long running apps

Cluster Archtectures and the NPACI Berkeley NOW

Cluster Archtectures and the NPACI Berkeley NOW

Presentation Transcript

NPACI Rocks Tutorial

NPACI Summer Institute

Ergonomics and the UC Berkeley Campus

Do Now- Identify the sampling method (Cluster or stratified)

Server Cluster and LVS based Cluster

NPACI ROCKS Technology

Using NPACI Systems Part I. NPACI Environment ( npaci/Resources )

NPACI/SDSC Security Activities

NPACI Panel on Clusters

Experiments Dan Goldman (now Berkeley) Mark Shattuck ( now City U. New York )

The GSTT/Gstt Cluster and the GSTP/Gstp Cluster

SDSC/NPACI Overview NPACI Parallel Computing Institute August 19, 2002

Quick Overview of NPACI Rocks

Using NPACI Systems Part I. NPACI Environment ( npaci/Resources )

NPACI Rocks Tutorial

BERKELEY AND IDEALISM

Berkeley Verification and Synthesis Research Center UC Berkeley

NPACI HotPage Updates

Berkeley Cluster Projects

NPACI Panel on Clusters