High Performance Communication using MPJ Express

High Performance Communication using MPJ Express Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan

Presentation Outline • Introduction • Parallel computing • HPC Platforms • Software programming models • MPJ Express • Design • Communication devices • Performance Evaluation

Serial vs Parallel Computing • Serial Computing • Parallel Computing

HPC Platforms • There are three kind of High performance computing (HPC) platforms. • Distributed Memory Architecture • Massively Parallel Processor (MPP) • Shared Memory Architecture • Symmetric Multi processor (SMP) , Multicore computers • Hybrid Architecture • SMP Clusters • Most of the modern HPC hardware is based on hybrid models Distributed Memory Shared Memory Hybrid

Software Programming Models • Shared Memory Models • Process has direct access to all memory • Pthreads, OpenMP • Distributed Memory Models • No direct access to memory of other processes • Message Passing Interface (MPI) Process

Message Passing Interface (MPI) • Message Passing Interface is the defacto standard for writing applications on parallel hardware • Primarily designed for distributed memory machines but it is also used for shared memory machines

MPI Implementations • OpenMPI • It is an open source production quality implementation of MPI-2 in C • Existing high performance drivers • TCP/IP, Shared memory, Myrinet, Quadrics, Infiniband • MPICH2 • It is the implementation of MPI on SMPs, clusters, and massively parallel processors • POSIX shared memory, SysV shared memory, Windows shared memory, Myrinet, Quadrics, Infiniband, 10 Gigabit Ethernet • MPJ Express • Implements the high level functionality of MPI in pure Java • Provides flexibility to update the layers or add new communication devices • TCP/IP, Myrinet, Threads shared memory, SysV shared memory

Presentation Outline • Introduction • Parallel computing • HPC Platforms • Software programming models • MPJ Express • Design • Communication devices • Performance Evaluation 8

Java NIO Device • Uses non-blocking I/O functionality, • Implements two communication protocols: • Eager-send • For small messages (< 128 Kbytes), • May incur additional copying. • Rendezvous: • Exchange of control messages before the actual transmission, • For long messages ( 128 Kbytes).

Standard mode with eager send protocol (small messages)

Standard mode with rendezvous protocol (large messages)

Shared Memory Communication Device • Threads based • MPJ process is represented by a Java thread and data is communicated using shared data structures. • sendQueue and recvQueue • SysV based • MPJ process is represented by a Unix process and data is communicated using shared data structures. • Java Module -The xdev API implementation for shared memory communication • C Module - Unix SysV Inter Process Communication methods • JNI Module – Bridge between C and Java.

MPI communication using sockets MPI communication using shared memory

Key Implementation aspects • Critical operations include: • Initialize • Point to point • Send • Receive • Finalize

Initialization Process 0’s shared memory segment Process 1’s shared memory segment Process 2’s shared memory segment Process 3’s shared memory segment

Point-to-point communication • Communication between two processes. • Source process sends message to destination process. • Source and destination processes are identified by their rank

Send Modes • Blocking Send • Only return from sub routine call when the operation has completed • Non Blocking Send • Return straight away and allow sub program to continue to perform other work. • At some time later check for the completion of the process

Sending a message • Memory space of each process is divided into sub-sections equal to the number of processes. • Each subsection is used for communication with one process.

Receiving a message • Destination process attaches itself to the shared memory segment of source process and starts reading messages from the sub-section allocated to it using offset

Finalization • When the communication is completed, barrier method is called at the end which synchronizes all process. • Then the finalize method is called which destroys the shared memory allocated to the processes.

Presentation Outline • Introduction • Parallel computing • HPC Platforms • Software programming models • Design and Implementation • Design • Communication devices • Performance Evaluation 22

Performance Evaluation • A ping pong program was written in which two processes repeatedly pass a message back and forth. • Timing calls to measure the time taken for one message. • We used a warm up loop of 10K iterations and the average time was calculated for 20K iterations after warm up. • We present latency and throughput graphs • Latency is the delay between the initiation of a network transmission by a sender and the receipt of that transmission by a receiver • Throughput is the amount of data that passes through a network connection over time as measured in bits per second. • We have plotted the latency graph from message size of 1 byte up to 2KB and bandwidth graph from 2KB to 16MB

Latency on Fast Ethernet

Throughput on Fast Ethernet

Latency on Gigabit Ethernet

Throughput on GigE

Latency on Myrinet

Throughput on Myrinet

Q ?

Further Reading • Parallel Computing • https://computing.llnl.gov/tutorials/parallel_comp/ • MPI • www.mcs.anl.gov/mpi • MPJ Express • http://mpj-express.org/ • MPICH2 • http://www.mcs.anl.gov/research/projects/mpich2/ • OpenMPI • http://www.open-mpi.org/

High Performance Communication using MPJ Express

High Performance Communication using MPJ Express

Presentation Transcript

High Performance JavaScript using Drupal's JavaScript API

Programming Parallel Hardware using MPJ Express

Using Overnight Express or Regular Mail For Written Communication

SCHOLARLY COMMUNICATION EXPRESS

MPJ Express

Travelling Salesperson Problem A Java mpj-express approach

High Performance Biomedical Applications Using Cloud Technologies

High Performance Faceted Interfaces Using S2S

High Plains Express Initiative

Using CUDA for High Performance Scientific Computing

High Performance Data Movement using GridFTP

Building reliable, high-performance communication systems from components

High Performance Communication for Oracle using InfiniBand

MPJ Express: An Implementation of Message Passing Interface (MPI) in Java

ADCIRC simulations using high performance resources

High Performance Communication Networks

High Performance Triplex Adder using CNTFET

HIGH PERFORMANCE

MPJ Express: An Implementation of Message Passing Interface (MPI) in Java

Programming High Performance Applications using Components