240 likes | 258 Views
Introduction to the GridMPI project focusing on high-performance communication facilities, protocols, and benchmark evaluations for MPI applications in metropolitan and wide-area network environments.
E N D
An Introduction of GridMPI http://www.gridmpi.org/ (1,2) (2) Yutaka Ishikawaand Motohiko Matsuda University of Tokyo Grid Technology Research Center, AIST (National Institute of Advanced Industrial Science and Technology) (1) (2) This work is partially supported by the NAREGI project Yutaka Ishikawa, The University of Tokyo
Motivation • MPI, Message Passing Interface, has been widely used to program parallel applications. • Users want to run such applications over the Grid environment without any modifications of the program. • However, the performance of existing MPI implementations is not scaled up on the Grid environment. computing resource site A computing resource site B Wide-area Network Single (monolithic) MPI application over the Grid environment Yutaka Ishikawa, The University of Tokyo
Motivation • Focus on metropolitan-area, high-bandwidth environment: 10Gpbs, 500miles (smaller than 10ms one-way latency) • We have already demonstrated that the performance of the NAS parallel benchmark programs are scaled up if one-way latency is smaller than 10ms using an emulated WAN environment . Motohiko Matsuda, Yutaka Ishikawa, and Tomohiro Kudoh, ``Evaluation of MPI Implementations on Grid-connected Clusters using an Emulated WAN Environment,'' CCGRID2003, 2003 computing resource site A computing resource site B Wide-area Network Single (monolithic) MPI application over the Grid environment Yutaka Ishikawa, The University of Tokyo
High Performance Communication Facilities for MPI on Long and Fat Networks TCP vs. MPI communication patterns Network Topology Latency and Bandwidth Interoperability Most MPI library implementations use their own network protocol. Fault Tolerance and Migration To survive a site failure Security Issues Internet Yutaka Ishikawa, The University of Tokyo
High Performance Communication Facilities for MPI on Long and Fat Networks TCP vs. MPI communication patterns Network Topology Latency and Bandwidth Interoperability Many MPI library implementations. Most implementations use their own network protocol. Fault Tolerance and Migration To survive a site failure Security Using Vendor B’s MPI library Using Vendor A’s MPI library Using Vendor D’s MPI library Using Vendor C’s MPI library Issues Internet Yutaka Ishikawa, The University of Tokyo
YAMPII VendorMPI GridMPI Features • MPI-2 implementation • IMPI (Interoperable MPI) protocol and extension for Grid • MPI-2 • New Collective protocols • Checkpoint • Integration of Vendor MPI • IBM, Solaris, Fujitsu, and MPICH2 • High Performance TCP/IP implementation on Long and Fat Networks • Pacing the transmission ratio so that the burst transmission is controlled according to the MPI communication pattern. • Checkpoint Cluster X Cluster Y IMPI Yutaka Ishikawa, The University of Tokyo
Four 1000Base-SX ports • One USB port for Host PC • FPGA (XC2V6000) Evaluation • It is almost impossible to reproduce the execution behavior of communication performance in the wide area network. • A WAN emulator, GtrcNET-1, is used to scientifically examine implementations, protocols, communication algorithms, etc. GtrcNET-1 • GtrcNET-1 is developed at AIST. • injection of delay, jitter, error, … • traffic monitor, frame capture http://www.gtrc.aist.go.jp/gnet/ Yutaka Ishikawa, The University of Tokyo
WAN Emulator Node0 Node8 GtrcNET-1 Host 0 Host 0 Host 0 Host 0 Host 0 Host 0 Catalyst 3750 Catalyst 3750 Experimental Environment 8 PCs 8 PCs ……… ……… Node15 Node7 • Bandwidth:1Gbps • Delay: 0ms -- 10ms • CPU: Pentium4/2.4GHz, Memory: DDR400 512MB • NIC: Intel PRO/1000 (82547EI) • OS: Linux-2.6.9-1.6 (Fedora Core 2) • Socket Buffer Size: 20MB Yutaka Ishikawa, The University of Tokyo
GridMPI vs. MPICH-G2 (1/4) FT (Class B) of NAS Parallel Benchmarks 3.2 on 8 x 8 processes Relative Performance One way delay (msec) Yutaka Ishikawa, The University of Tokyo
GridMPI vs. MPICH-G2 (2/4) IS (Class B) of NAS Parallel Benchmarks 3.2 on 8 x 8 processes Relative Performance One way delay (msec) Yutaka Ishikawa, The University of Tokyo
GridMPI vs. MPICH-G2 (3/4) LU (Class B) of NAS Parallel Benchmarks 3.2 on 8 x 8 processes Relative Performance One way delay (msec) Yutaka Ishikawa, The University of Tokyo
GridMPI vs. MPICH-G2 (4/4) NAS Parallel Benchmarks 3.2 Class B on 8 x 8 processes Relative Performance No parameters tuned in GridMPI One way delay (msec) Yutaka Ishikawa, The University of Tokyo
Relative performance Pentium-4 2.4GHz x 8 connected by 1G Ethernet @ Tsukuba Pentium-4 2.8 GHz x 8 Connected by 1G Ethernet @ Akihabara Benchmarks JGN2 Network 10Gbps Bandwidth 1.5 msec RTT 60 Km (40mi.) GridMPI on Actual network • NAS Parallel Benchmarks run using 8 node (2.4GHz) cluster at Tsukuba and 8 node (2.8GHz) cluster at Akihabara • 16 nodes • Comparing the performance with • result using 16 node (2.4 GHz) • result using 16 node (2.8 GHz) Yutaka Ishikawa, The University of Tokyo
Pentium-4 2.4GHz x 8 connected by 1G Ethernet @ Tsukuba Pentium-4 2.8 GHz x 8 Connected by 1G Ethernet @ Akihabara JGN2 Network 10Gbps Bandwidth 1.5 msec RTT 60 Km (40mi.) Demonstration • Easy installation • Download the source • Make it and set up configuration files • Easy use • Compile your MPI application • Run it ! Yutaka Ishikawa, The University of Tokyo
Grid-Enabled Nano-Applications Grid PSE Grid Visualization Grid Programing -Grid RPC -Grid MPI Grid Workflow Distributed Information Service Super Scheduler Data (Globus,Condor,UNICOREOGSA / WSRF) Grid VM High-Performance & Secure Grid Networking NAREGI Software Stack (Beta Ver. 2006) Yutaka Ishikawa, The University of Tokyo
GridMPI Current Status http://www.gridmpi.org/ • GridMPI version 0.9 was released • MPI-1.2 features are fully supported • MPI-2.0 features are supported except for MPI-IO and one sided communication primitives • Conformance Tests • MPICH Test Suite: 0/142 (Fails/Tests) • Intel Test Suite: 0/493 (Fails/Tests) • GridMPI version 1.0 will be released in this Spring • MPI-2.0 fully supported Yutaka Ishikawa, The University of Tokyo
Concluding Remarks • GridMPI is integrated into the NaReGI package. • GridMPI is not only for production but also our research vehicle for Grid environment in the sense that the new idea in Grid is implemented and tested. • We are currently studying high-performance communication mechanisms in the long and fat network: • Modifications of TCP Behavior • M Matsuda, T. Kudoh, Y. Kodama, R. Takano, and Y. Ishikawa, “TCP Adaptation for MPI on Long-and-Fat Networks,” IEEE Cluster 2005, 2005. • Precise Software Pacing • R. Takano, T. Kudoh, Y. Kodama, M. Matsuda, H. Tezuka, Y. Ishikawa, “Design and Evaluation of Precise Software Pacing Mechanisms for Fast Long-Distance Networks”, PFLDnet2005, 2005. • Collective communication algorithms with respect to network latency and bandwidth. Yutaka Ishikawa, The University of Tokyo
BACKUP Yutaka Ishikawa, The University of Tokyo
MPI API IMPI LACT Layer (Collectives) RPIM Interface Request Interface Request Layer P2P Interface ssh rsh SCore Globus Vendor MPI IMPI TCP/IP Vendor MPI PMv2 MX O2G GridMPI Version 1.0 • YAMPII, developed at the University of Tokyo, is used as the core implementation • Intra communication by YAMPII(TCP/IP、SCore) • Inter communication by IMPI(TCP/IP) Yutaka Ishikawa, The University of Tokyo
GridMPI vs. Others (1/2) NAS Parallel Benchmarks 3.2 Class B on 8 x 8 processes Relative Performance One way delay (msec) Yutaka Ishikawa, The University of Tokyo
Relative Performance GridMPI vs. Others (1/2) NAS Parallel Benchmarks 3.2 Class B on 8 x 8 processes Yutaka Ishikawa, The University of Tokyo
Relative Performance GridMPI vs. Others (2/2) NAS Parallel Benchmarks 3.2 Class B on 8 x 8 processes Yutaka Ishikawa, The University of Tokyo
GridMPI vs. Others NAS Parallel Benchmarks 3.2 16 x 16 Relative Performance Yutaka Ishikawa, The University of Tokyo
GridMPI vs. Others NAS Parallel Benchmarks 3.2 Relative Performance Yutaka Ishikawa, The University of Tokyo