Parallel Computing on Wide-Area Clusters: the Albatross Project

vrije Universiteit Parallel Computing on Wide-Area Clusters: the Albatross Project Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences Aske Plaat Thilo Kielmann Jason Maassen Rob van Nieuwpoort Ronald Veldema

Introduction • Cluster computing becomes popular • Excellent price/performance ratio • Fast commodity networks • Next step: wide-area cluster computing • Use multiple clusters for single application • Form of metacomputing • Challenges • Software infrastructure (e.g., Legion, Globus) • Parallel applications that can tolerate WAN-latencies

Albatross project • Study applications and programming environments for wide-area parallel systems • Basic assumption: wide-area system is hierarchical • Connect clusters, not individual workstations • General approach • Optimize applications to exploit hierarchical structure most communication is local

Outline • Experimental system and programming environments • Application-level optimizations • Performance analysis • Wide-area optimized programming environments

Distributed ASCI Supercomputer (DAS) VU (128) UvA (24) Node configuration 200 MHz Pentium Pro 64-128 MB memory 2.5 GB local disks Myrinet LAN Fast Ethernet LAN Redhat Linux 2.0.36 6 Mb/s ATM Leiden (24) Delft (24)

Programming environments • Existing library/language + expose hierarchical structure • Number of clusters • Mapping of CPUs to clusters • Panda library • Point-to-point communication • Group communication • Multithreading Java Orca MPI Panda LFC TCP/IP Myrinet ATM

Example: Java • Remote Method Invocation (RMI) • Simple, transparent, object-oriented, RPC-like communication primitive • Problem: RMI performance • JDK RMI on Myrinet is factor 40 slower than C-RPC(1228 vs. 30 µsec) • Manta: high-performance Java system [PPoPP’99] • Native (static) compilation: source  executable • Fast RMI protocol between Manta nodes • JDK-style protocol to interoperate with JVMs

JDK versus Manta 200 MHz Pentium Pro, Myrinet, JDK 1.1.4 interpreter,1 object as parameter

2 orders of magnitude between intra-cluster (LAN) and inter-cluster (WAN) communication performance Application-level optimizations [JavaGrande’99] Minimize WAN-overhead Manta on wide-area DAS

Example: SOR • Red/black Successive Overrelaxation • Neighbor communication, using RMI • Problem: nodes at cluster-boundaries • Overlap wide-area communication with computation • RMI is synchronous  use multithreading 5600 µsec µs 50 CPU 1 CPU 2 CPU 3 CPU 4 CPU 5 CPU 6 Cluster 1 Cluster 2

Wide-area optimizations

Performance Java applications • Wide-area DAS system: 4 clusters of 10 CPUs • Sensitivity to wide-area latency and bandwidth: • See HPCA’99

Discussion • Optimized applications obtain good speedups • Reduce wide-area communication, or hide its latency • Java RMI is easy to use, but some optimizations are awkward to express • Lack of asynchronous communication and broadcast • RMI model does not help exploiting hierarchical structure of wide-area systems • Need wide-area optimized programming environment

MagPIe: wide-area collective communication • Collective communication among many processors • e.g., multicast, all-to-all, scatter, gather, reduction • MagPIe: MPI’s collective operations optimized for hierarchical wide-area systems [PPoPP’99] • Transparent to application programmer

Spanning-tree broadcast • MPICH (WAN-unaware) • Wide-area latency is chained • Data is sent multiple times over same WAN-link • MapPIe (WAN-optimized) • Each sender-receiver path contains at most 1 WAN-link • No data item travels multiple times to same cluster Cluster 1 Cluster 2 Cluster 3 Cluster 4

MagPIe results • MagPIe collective operations are wide-area optimal, except non-associative reduction • Operations up to 10 times faster than MPICH • Factor 2-3 speedup improvement over MPICH for some (unmodified) MPI applications

Conclusions • Wide-area parallel programming is feasible for many applications • Exploit hierarchical structure of wide-area systems to minimize WAN overhead • Programming systems should take hierarchical structure of wide-area systems into account

Parallel Computing on Wide-Area Clusters: the Albatross Project

Parallel Computing on Wide-Area Clusters: the Albatross Project

Presentation Transcript

Workshop on Parallel Computing

Parallel Computing

Parallel Computing Explained Parallel Computing Overview

parallel data mining on multicore clusters

Parallel Computing

Parallel Computing

Albatross

Modeling and Taming Parallel TCP on the Wide Area Network

Scalable Parallel Computing on Clouds

Parallel Computing With High Performance Computing Clusters (HPCs)

parallel data mining on multicore clusters

Parallel Computing on Graphics Processors

Parallel Computing

On the Performance of Wide Area Thin-Client Computing

Parallel Programming On the IUCAA Clusters

Seminar on parallel computing

Computing on Jetstream: Streaming Analytics In the Wide-Area

Wide-Area Parallel Computing in Java

Parallel Computing on Manycore GPUs

More on Parallel Computing

Parallel Computing on Wide-Area Clusters: the Albatross Project

Parallel Simulations on High-Performance Clusters