Parallel Processing

Javier Delgado Grid-Enabledment of Scientific Applications Professor S. Masoud Sadjadi Parallel Processing

Parallel Processing - GCB Outline • Why parallel processing • Overview • The Message Passing Interface (MPI)‏ • Introduction • Basics • Examples • OpenMP • Alternatives to MPI

Parallel Processing - GCB Why parallel processing? • Computationally-intensive scientific applications • Hurricane modelling • Bioinformatics • High-Energy Physics • Physical limits of one processor

Parallel Processing - GCB Types of Parallel Processing • Shared Memory • e.g. Multiprocessor computer • Distributed Memory • e.g. Compute Cluster

Parallel Processing - GCB Shared Memory • Advantages • No explicit message passing • Fast • Disadvantages • Scalability • Synchronizaton Source: http://kelvinscale.net

Parallel Processing - GCB Distributed Memory • Advantages • Each processor has its own memory • Usually more cost-effective • Disadvantages • More programmer involvement • Slower

Parallel Processing - GCB Combination of Both • Emerging trend • Best and worst of both worlds

Parallel Processing - GCB Outline • Why parallel processing • Overview • The Message Passing Interface (MPI)‏ • Introduction • Basics • Examples • OpenMP • Alternatives to MPI

Parallel Processing - GCB Message Passing • Standard for Distributed Memory systems • Networked workstations can communicate • De Facto specification: • The Message Passing Interface (MPI)‏ • Free MPI Implementations: • MPICH • OpenMPI • LAM-MPI

Parallel Processing - GCB MPI Basics • Design Virtues • Defines communication, but not its hardware • Expressive • Performance • Concepts • No adding/removing of processors during computation • Same program runs on all processors • Single-Program, Multiple Data (SPMD)‏ • Multiple Instruction, Multiple Data (MIMD)‏ • Processes identified by “rank”

Parallel Processing - GCB Communication Types • Standard • Synchronous (blocking send)‏ • Ready • Buffered (asynchronous)‏ • For non-blocking communication: • MPI_Wait – block until receive • MPI_Test - true/false

Parallel Processing - GCB Message Structure Data Length Data Type Data Type Variable Name Data Length Data Send Recv Destination Status Communication context Communication context Tag Tag

Parallel Processing - GCB Data Types and Functions • Uses its own types for consistency • MPI_INT, MPI_CHAR, etc. • All Functions prefixed with “MPI_” • MPI_Init, MPI_Send, MPI_Recv, etc.

Parallel Processing - GCB Our First Program: Numerical Integration • Objective: Calculate area under f(x) = x2 • Outline: • Define variables • Initialize MPI • Determine subset of program to calculate • Perform Calculation • Collect Information (at Master)‏ • Send Information (Slaves)‏ • Finalize

Parallel Processing - GCB Our First Program • Download Link: • http://www.fiu.edu/~jdelga06/integration.c

Parallel Processing - GCB Variable Declarations #include "mpi.h" #include <stdio.h> /* problem parameters */ #define f(x) ((x) * (x))‏ #define numberRects 50 #define lowerLimit 2.0 #define upperLimit 5.0 int main( int argc, char * argv[] )‏ { /* MPI variables */ int dest, noProcesses, processId, src, tag; MPI_Status status; /* problem variables */ int i; double area, x, height, lower, width, total, range; ...

Parallel Processing - GCB MPI Initialization int main( int argc, char * argv[] )‏ { ... MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &noProcesses); MPI_Comm_rank(MPI_COMM_WORLD, &processId); ...

Parallel Processing - GCB Calculation int main( int argc, char * argv[] )‏ { ... /* adjust problem size for subproblem*/ range = (upperLimit - lowerLimit) / noProcesses; width = range / numberRects; lower = lowerLimit + range * processId; /* calculate area for subproblem */ area = 0.0; for (i = 0; i < numberRects; i++)‏ { x = lower + i * width + width / 2.0; height = f(x); area = area + width * height; } ...

Parallel Processing - GCB Sending and Receiving int main( int argc, char * argv[] )‏ { ... tag = 0; if (processId == 0) /* MASTER */ { total = area; for (src=1; src < noProcesses; src++)‏ { MPI_Recv(&area, 1, MPI_DOUBLE, src, tag, MPI_COMM_WORLD, &status); total = total + area; } fprintf(stderr, "The area from %f to %f is: %f\n", lowerLimit, upperLimit, total ); } else /* WORKER (i.e. compute node) */ { dest = 0; MPI_Send(&area, 1, MPI_DOUBLE, dest, tag, MPI_COMM_WORLD); }; ...

Parallel Processing - GCB Finalizing int main( int argc, char * argv[] )‏ { ... MPI_Finalize(); return 0; }

Parallel Processing - GCB Communicators • MPI_COMM_WORLD – All processes involved • What if different workers have different tasks?

Parallel Processing - GCB Additional Functions • Data Management • MPI_Bcast (broadcast)‏ • Collective Computation • Min, Max, Sum, AND, etc. • Benefits: • Abstraction • Optimized Source: http://www.pdc.kth.se

Parallel Processing - GCB Typical Problems • Designing • Debugging • Scalability

Parallel Processing - GCB Scalability Analysis • Definition: Estimation of resource (computation and computation) requirements of a program as problem size and/or number of processors increases • Require knowledge of communication time • Assume otherwise idle nodes • Ignore data requirements of node

Parallel Processing - GCB Simple Scalability Example • Tcomm = Time to send a message • Tcomm = s + rn • s = start-up time • r = time to send a single byte (i.e. 1/bandwidth)‏ • n = size of the data type (int, double, etc.)‏

Parallel Processing - GCB Simple Scalability Example • Matrix Multiplication of two square matrices of size (N x N). • First Matrix is broadcasted to all nodes • Cost for the rest • Computation • n multiplications and (n – 1) additions per cell • n2 x (2n – 1) = 2n3 -n2 floating point operations • Communication • Send n elements to worker node, and return the resulting n elements to the master node (2n)‏ • After doing this for each column in the result matrix: • n x 2n

Parallel Processing - GCB Simple Scalability Example • Therefore, we get the following ratio of communication to computation • As n becomes very large, the ratio approaches 1/n. So this problem is not severely affected by communication overhead

Parallel Processing - GCB References • http://nf.apac.edu.au/training/MPIProg/mpi-slides/allslides.html • High Performance Linux Clusters. By Joseph D. Sloan. O'Reilly Press. • Using MPI, second edition. By Gropp, Lusk, and Skjellum. MIT Press.

Parallel Processing