140 likes | 238 Views
Cross-site running on TeraGrid using MPICH-G2. Presented by Krishna Muriki (SDSC) on behalf of Dr. Nick Karonis (NIU). What is MPICH-G2 ?. Full implementation of MPI-1.1 standard “Very little MPI-2 functionality” (client/server discussed later) Developed by Karonis and Toonen
E N D
Cross-site running on TeraGridusing MPICH-G2 • Presented byKrishna Muriki (SDSC) • on behalf of • Dr. Nick Karonis (NIU)
What is MPICH-G2 ? • Full implementation of MPI-1.1 standard • “Very little MPI-2 functionality” • (client/server discussed later) • Developed by Karonis and Toonen • (Northern Illinois university and • Argonne National Laboratory) • Makes extensive use of Globus Services, • and therefore …. • MPICH-G2 is - ‘grid-enabled’ MPI
How Does MPICH-G2 Work? Local MPI Local MPI MPICH-G2 TCP (Data conversion, if necessary) Computer B Computer A
Should I use MPICH-G2? • Applications that are distributed by design • --- Scientific applications that need either • More compute power or • More memory • or both. • Applications that are distributed by nature • --- Remote visualization applications • --- Client/Server applications etc ….
How to use MPICH-G2 on TG? • Five steps :( Assuming good SoftEnv and Globus environments ) • Add proper softkeys to your ~/.soft file. • (eg : +mpich-g2-intel or +mpich-g2-gcc) • Next on each TG site, compile & link your application • using one of MPICH-G2’s compilers • - mpicc, mpiCC, mpif77 or mpif90 • At a single TG site do : grid-proxy-init • Write your own Globus RSL file ( example later ) • Launch your application using MPICH-G2’s mpirun • command and your RSL file : • - mpirun –globusrsl myfile.rsl • - procs will not return from MPI_Init() untill all procs • have started executing on TG compute nodes.
An Example RSL file + (&(resourceManagerContact=“tg-grid1.uc.teragrid.org/jobmanager-pbs_gcc") (count=10) (hostcount=“10:ia64-compute”) (project=“TG-STA044017N”) (jobtype=mpi) (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)) (executable=/homes/users/smith/myapp) ) (&(resourceManagerContact=“tg-login1.ncsa.teragrid.org/jobmanager-pbs_gcc") (count=5) (hostcount=“5:compute”) (jobtype=mpi) (environment=(GLOBUS_DUROC_SUBJOB_INDEX 1)) (executable=/homes/users/smith/myapp) ) (&(resourceManagerContact=“tg-login1.sdsc.teragrid.org/jobmanager-pbs_gcc ") (count=10) (jobtype=mpi) (environment=(GLOBUS_DUROC_SUBJOB_INDEX 2)) (executable=/users/smith/myapp) )
Additional “grid” features • gridFTP --- “turn on” Globus’ gridFTP for inter-cluster messaging --- Increases bandwidth for large messages • Grid topology discovery --- Create MPI communicators based on where processes are running --- For example, place all processes running on SDSC-TG into one communicator
gridFTP • #include <mpi.h> • int main(int argc, char *argv[]) { • int numprocs, my_id; • struct gridftp_params gfp; /* MPICH-G2 structure in mpi.h */ • MPI_Init(&argc, &argv); • MPI_Comm_rank(MPI_COMM_WORLD, &my_id); • if (my_id == 0 || my_id == 1) { • /* must set these three fields */ • gfp.partner_rank = (my_id ? 0 : 1); • gfp.nsocket_pairs = 64; • gfp.tcp_buffsize = 256*1024; • MPI_Attr_put(MPI_COMM_WORLD, MPICHX_PARALLELSOCKETS_PARAMETERS, &gfp); • } /* endif */ • /* • * from this point all messages exchanged between MPI_COMM_WORLD ranks 0 and 1 will be automatically • * partitioned and transported over parallel sockets (i.e., use gridFTP) • */ • MPI_Finalize(); • } /* end main() */
Topology discovery • #include <mpi.h> • int main (int argc, char *argv[]) { • int me,flag; • MPI_Comm TG_site_comm; • int *depths, **colors; • MPI_Init(&argc, &argv); • MPI_Comm_rank(MPI_COMM_WORLD, &me); • MPI_Attr_get(MPI_COMM_WORLD, MPICHX_TOPOLOGY_DEPTHS, &depths, &flag); • MPI_Attr_get(MPI_COMM_WORLD, MPICHX_TOPOLOGY_COLORS, &colors, &flag); • /* creates new communicator TG_site_comm that groups procs based on TG site */ • MPI_Comm_split(MPI_COMM_WORLD, • (depths[me]==4 ? colors[me][3] : MPI_UNDEFINED), • 0, &TG_site_comm); • MPI_Finalize(); • } /* end main() */
Some MPI-2 Functionality • MPICH-G2 supports the following functions from the MPI Standard 2.0 Server routines • MPI_Open_port • MPI_Close_port • MPI_Comm_accept • Client routines • MPI_comm_connect
Example server • #include <mpi.h> • Int main(int argc, char *argv[]) { • int my_id; • char port_name[MPI_MAX_PORT_NAME]; • MPI_Comm newcomm; • MPI_Init(&argc, &argv); • MPI_Comm_rank(MPI_COMM_WORLD, &my_id); • if (my_id == 0) { • MPI_Open_port(MPI_INFO_NULL, port_name); /* “fills” port_name */ • } /* endif */ • MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &newcomm); • if (my_id == 0) { • MPI_Send(&my_id, 1, MPI_INT, 0, 0, newcomm); • printf("after sending %d\n", my_id); • MPI_Close_port(port_name); • } /* endif */ • MPI_Finalize(); • } /* end main() */
Example client • #include <mpi.h> • int main(int argc, char *argv[]) { • int passed_num; • int my_id; • char *port_name; • MPI_Comm newcomm; • MPI_Init(&argc, &argv); • MPI_Comm_rank(MPI_COMM_WORLD, &my_id); • port_name = argv[1]; • MPI_Comm_connect(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &newcomm); • if (my_id == 0) • { • MPI_Status status; • MPI_Recv(&passed_num, 1, MPI_INT, 0, 0, newcomm, &status); • } /* endif */ • MPI_Finalize(); • } /* end main() */
Coming Soon! • A complete re-write of MPICH-G2 • To be included in next MPICH release scheduled for June 2006 and then on the TG soon thereafter • Integrated with Globus Web Services • Reduced intra-cluster message latency • Multi-threaded (i.e., inter-cluster MPI_Isend/Irecv’s will truly overlap computation and communication) • More inter-cluster messaging options (e.g., reliable UDP-based, compression, encryption, etc.) • Later, full MPI-2 functionality
For More Information • Visit the MPICH-G2 web page: • www.niu.edu/mpi • or • www.globus.org/mpi