200 likes | 427 Views
A Performance Comparison of DSM, PVM, and MPI. Paul Werstein Mark Pethick Zhiyi Huang. Introduction. Relatively little is known about the performance of Distributed Shared Memory systems compared to Message Passing systems.
E N D
A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang
Introduction Relatively little is known about the performance of Distributed Shared Memory systems compared to Message Passing systems. We compare the performance of the TreadMarks DSM system with two popular message passing systems, MPICH-MPI, and PVM.
Introduction Three applications are compared, Mergesort, Mandelbrot Set Generation, and Backpropergation Neural Network. Each application represents a different class of problem.
TreadMarks DSM • Provides locks and barriers as primitives. • Uses Lazy Release Consistency. • Granularity of sharing is a page. • Creates page differentials to avoid the false sharing effect. • Version 1.0.3.3
Parallel Virtual Machine • Provides concept of a virtual parallel machine. • Exists as a daemon on each node. • Inter-process communication is mediated by the daemons. • Design for flexibility. • Version 3.4.3.
MPICH - MPI • Standard interface for developing Message Passing Applications. • Primary design goal is performance. • Primarily defines communications primitives. • MPICH is a reference platform for the MPI standard. • Version 1.2.4
System • 32 Node Linux Cluster • 800mhz Pentium with 256 MB • Redhat 7.2 • 100mbit Ethernet • Results determined for 1, 2, 4, 8, 16, 24, and 32 processes.
Mergesort • Parallelisation Strategy used is Divide and Conqueror. • Synchronisation between pairs of nodes. • Loosely Synchronous class problem. • Coarse grained synchronisation • Irregular synchronisation points. • Alternate phases of computation and communication.
Mandelbrot Set • Strategy used is Data Partitioning. • Work Pool is used as computation time of sections differs. • Work Pool size >= 2 * num processes. • Embarrassingly Parallel class problem. • May involve complex computation, but there is very little communication. • Give indication of performance Under ideal conditions.
Neural Network (1) • Strategy is Data Partitioning. • Each processor trains the network on a subsection of the data set. • Changes are summed and applied at the end of each epoch. • Requires large data sets to be effective. .
Neural Network (2) • Synchronous class problem. • Characterised by algorithm that carries out the same operation on all points in the data set. • Synchronisation occurs at regular points. • Often applies to problems that use data partitioning. • A large number of problems appear to belong to the synchronous class.
Conclusion • In general the performance of DSM is poorer than that of MPICH or PVM. • Main reasons identified are: • The increased use of memory associated with the creation of page differentials. • False sharing affect due to the granularity of sharing. • Differential accumulation in the gather operation.