790 likes | 818 Views
The Grid as a Parallel Computer. Francis C.M. Lau Department of Computer Science The University of Hong Kong www.cs.hku.hk/~fcmlau. Greetings from Hong Kong!. Systems research @ HKU. www.srg.cs.hku.hk. hkgrid.org. HKGrid – the initial setup (2004). www.cngrid.org.
E N D
The Grid as a Parallel Computer Francis C.M. LauDepartment of Computer ScienceThe University of Hong Kong www.cs.hku.hk/~fcmlau
Greetings from Hong Kong! Systems research @ HKU
The 500th machine at 11/2005 has a peak of 2.9 Tflops The 1st (DOE/BlueGene) has 0.37 Pflops
Agenda Parallel computing state of affairs Parallel computing many faces Grid as a parallel computer Our first attempt – G-JavaMPI Some thoughts for the future
“Oxen vs. chickens” • “If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?” - Seymour Cray (’25–’96) • Your choice?
Will time tell? • “At best, clusters are a loose collection of unmanaged, individual, microprocessor-based computers … Most cluster [experts] know now that users are fortunate to get more than 8% of the peak performance in sustained performance.” - Dr. Paul Terry, CTO, Cray Canada, 2004
Never to predict the future? • “No one will need more than 640 kb of memory for a personal computer.” (Bill Gates, 1981, wrongly attributed?)
You need cpu, cpu, … Software complexity Subramanian, 1999
Many faces of “parallel” computing • Distributed computing (DC) • Multiple computers remote from each other, each having a role in a computation problem • Loose parallelism • Cluster computing (CC) • DC on a LAN, with homogeneous processing nodes (typically PCs), to form what appears to be a single, highly-available system • Grid computing (GC) • A potentially very large DC operatingas an anarchy • As large as the Internet/WWW • Parallelism at stake?
Cluster: chicken farm • Grid: animal zoo • Enterprise grid: a private zoo in the backyard • Distributed system: a “static” zoo where the animals are tame
From cluster to grid • One of the main ideas of cluster computing is that, to the outside world, the cluster appears to be a single system, which is also the reason for clustering’s extreme successes • A cluster can be programmed like a single computer, almost • Can a grid? Should a grid?
Grid vs. service oriented computing • To many, the two are almost synonymous • Just as Web and the Internet are almost synonymous • SOC refers to binding to Web services at runtime • Grid is about the provisioning of resources • The current grid’s use of Web services was out of convenience (my opinion) • But the service paradigm shouldnot be the only possible form ofcomputing with the grid
You want a hamburger– you can either go toMacdonalds or do it yourself • SOC applied to the Web (as a grid) is probably best for commercial applications (Macdonalds) • For scientific or grand challenge problems, we need to program the grid (DIY)
Cluster: more nodes than microprocessors in each node (MPI) • Constellation: A node has more microprocessors than # nodes (OpenMP) • Tightly integrated MPP • Grid?
Grid vs. clustering • Grid: heterogeneous resources (computation, storage, networking, OS, etc.) • Grid: dynamic (resources come and go) • Grid: distributed over a local or wide area • Grid: increased scalability (no latency/proximity limits) • Grid: multiple ownerships • Grid and cluster are complementary
Issues • Heterogeneity • Availability • Latencies • Security and trustworthiness • Load balancing! • Towards single system image (SSI) • Grid: heterogeneous resources (computation, storage, networking, OS) • Grid: dynamic (resources come and go) • Grid: distributed over a local or wide area • Grid: increased scalability (no latency/proximity limits) • Grid: multiple ownerships • Grid and cluster are complementary
Parallel applications • Multiple processes, multiple threads • Application types • SIMD (Single Instruction, Multiple Data) • SPMD (Single program, multiple data) • MIMD (Multiple Instruction, Multiple Data)
Need for process/thread migration • SIMD: Remapping (re-partitioning) of data works • For MIMD, “processes” might grow or shrink, or come and go • Remapping of processes = process migration • Processes with large footprints (i.e., many threads) might benefit from spreading their threads across machines
Process migration • Initially (load distribution) • Dynamic • State capture and resume • Thread migration • Threads are often tightly coupled and share much data • Beneficial? • A big challenge
Thread migration works! • Probably not suitable for grid, fine for cluster where latencies are upper-bounded • Our experience: the JESSICA2 system • A distributed JVM • Dynamic Java thread migration • JIT compilation • Global object space • I/O redirection JavaEnabledSingleSystemImageComputingArchitecture
JESSICA2 Architecture A Multithreaded Java Program Thread Migration JIT Compiler Mode Portable Java Frame JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM JESSICA2 JVM Master Worker Worker Worker Worker Worker Global Object Space
G-JavaMPI Towards “grid as a parallel computer”
M-JavaMPI • “M” stands for migration • For cluster • G-JavaMPI • An outgrowth of M-JavaMPI • “G” for grid
Policy space Task migration (Grid traveler) Identity mapping Organization Warranted Organization G-JavaMPI A Grid Middleware for Transparent MPI Task Migration and Runtime Scheduling
Grid-enabled implementation of the Java language bindings of the MPI v1.1 standard • On top of Globus Toolkit (e.g., job startup, security) and MPICH-G2 (MPI communication) • Combining the high-level message passing interface with the Java language to support portable messaging-passing programming in a grid • It allows you to run MPI applications written in Java across multiple machines with different architectures belonging to multiple organizations • Classes of problems implemented in C-MPI (for example, MPICH) can be easily ported to G-JavaMPI, but with additional support of process migration • A better choice for those people who enjoy object-oriented programming style more