230 likes | 396 Views
VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing. Dr Zhiyi Huang Dept of Computer Science University of Otago New Zealand. Motivation. DSM applications are not as efficient as MPI on cluster computers. VOPP.
E N D
VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University of Otago New Zealand Zhiyi’s RSL
Motivation • DSM applications are not as efficient as MPI on cluster computers Zhiyi’s RSL
VOPP • VODCA is a system supporting View-Oriented Parallel Programming (VOPP) • Why a new programming style? • Improve the performance of DSM applications on cluster computers • Provide a programming style better than MPI • Message passing is notoriously known as a difficult programming style Zhiyi’s RSL
What is a view? • Suppose M is the set of data objects in shared memory • A view is a group of data objects from the shared memory • V, VM • Views must not overlap each other • Vi, Vj, i j, Vi Vj = • Suppose there are n views in shared memory • ∑ Vi=M Zhiyi’s RSL
VOPP Requirements • The programmer should divide the shared data into a number of views according to the data flow of the parallel algorithm. • A view should consist of data objects that are always processed as an atomic set in a program. • Views can be created and destroyed anytime. • Each view has a unique view identifier Zhiyi’s RSL
VOPP Requirements (cont.) • View primitives such as acquire_view and release_view must be used when a view is accessed. acquire_view(View_A); A = A + 1; release_view(View_A); • acquire_Rview and release_Rview can be used when a view is only read by a processor. Zhiyi’s RSL
Example • A VOPP program for a producer/consumer problem If(prod_id == 0){ acquire_view(1); produce(x); release_view(1); } barrier(0); acquire_Rview(1); consume(x); release_Rview(1); Zhiyi’s RSL
Advantages of VOPP • Keep the convenience of shared memory programming • Focus on data partitioning and data access instead of data race and mutual exclusion • View primitives automatically achieve mutual exclusion • View primitives are not extra burden • The programmer can finely tune the parallel algorithm by careful view partitioning Zhiyi’s RSL
Philosophy of VOPP • Shared memory is a critical resource that needs to be used with care • If there is no need to use shared memory, don’t use it • Justification is wanted before a view is created Zhiyi’s RSL
VOPP vs. MPI • Easier for programmers than MPI • For problems like task queue, programming with MPI is horrific. • Can mimic any finely-tuned MPI program • Shared message view • Send/recv acquire_view • Essential differences • View is location transparent • More barriers in VOPP Zhiyi’s RSL
Implementation • VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing • VODCA version 1.0 • Released as an open source software • A library run at the user space • Based on View-based Consistency • Use an efficient consistency protocol VOUPID Zhiyi’s RSL
View-based Consistency • Condition for View-based Consistency • Before a processor Pi is allowed to access a view by calling acquire_view or acquire_Rview, all previous write accesses to data objects of the view must be performed with respect to Pi according to their causal order. • In VOPP, barriers are only used for synchronization and have nothing to do with consistency maintenance for DSM. Zhiyi’s RSL
Consistency protocols • They are page based • Update protocol • Modify immediately • Invalidation protocol • Use a write notice to invalidate a page • When the page is accessed, a page fault causes the fetch of diffs which are applied on the page Zhiyi’s RSL
Consistency protocols (cont.) • Home-based protocol • Based on invalidate protocol, but • For each page, use a copy as its home • When a diff is created, it is applied to the home copy immediately • When the page is accessed, a page fault causes the fetch of the home copy (Pros: resolve the diff accumulation problem) Zhiyi’s RSL
The VOUPID protocol • View-Oriented Update Protocol with Integrated Diff • Based on the update protocol • Diffs of a page of a view are merged into a single diff • The single diff is used to update the page when the view is acquired Zhiyi’s RSL
Experiment • Use a cluster computer • The cluster computer, in Tsinghua Univ., consists of 128 Itanium 2 running Linux 2.4, connected by InfiniBand. Each node has two 1.3 GHz processors and 4 Gbytes RAM. We run two processes on each node. • We used four applications, Integer Sort (IS), Gauss, Successive Over-Relaxation (SOR), and Neural Network (NN). Zhiyi’s RSL
Related systems • TreadMarks (TMK) is a state-of-the-art Distributed Shared Memory system based on traditional parallel programming. • Message Passing Interface (MPI) is a standard for message passing-based parallel programming. We used LAM/MPI. Zhiyi’s RSL
Performance of NN Zhiyi’s RSL
Performance of IS Zhiyi’s RSL
Performance of SOR Zhiyi’s RSL
Performance of Gauss Zhiyi’s RSL
Future work on VOPP • More benchmarks/applications • Performance evaluation on larger clusters • Optimized implementation of barriers for VOPP • More auxiliary utilities for VOPP programmers • A view-based debugger for VOPP • A fault-tolerant system for VODCA Zhiyi’s RSL
Questions? Zhiyi’s RSL