220 likes | 389 Views
CVM (Coherent Virtual Machine). CVM. CVM is a user-level library Enable the program to exploit shared-memory semantics over message-passing hardware. Page-based DSM Written in C++ Built on top of UDP or MPI. CVM. CVM was created by Pete Keleher in 1995.
E N D
CVM • CVM is a user-level library • Enable the program to exploit shared-memory semantics over message-passing hardware. • Page-based DSM • Written in C++ • Built on top of UDP or MPI
CVM • CVM was created by Pete Keleher in 1995. • CVM was created specifically as a platform for protocol experimentation. • These slides are based on the material in CVM manual, which can be found on website (http://www.cs.umd.edu/projects/cvm)
Initialization / Termination • Initialization • cvm_startup(int, char**) • Called after the program processes its own argument. • program <opt> -- <CVM opt> -- <protocol opt> • Termination • cvm_finish() • Called by master process, it will wait until all processes are completed • cvm_exit(char*, …) • A quick exit for error
Example • Most program are in the following form int main(int argc,char*argv[]) { … cvm_startup(argc,argv); … … cvm_finish(); }
Process Creation • cvm_create_procs(func_ptr worker) • Create the execution entries on all slave machines. • The function should be in the form • void (*worker)() • There are some pre-defined macro and variables can be used. • cvm_num_procs, cvm_proc_id, PID, TID
Shared memory allocation • cvm_alloc(int sz) • Generally, all shared data in CVM programs is necessarily dynamically allocated. • All calls to cvm_alloc() must be completed before cvm_create_procs() • The usage is the same as malloc() • int *buf = (int*)cvm_alloc( sizeof(int) * N )
Synchronization • cvm_lock(int id), cvm_unlock(int id) • Acquire and release the global lock specified by id; • Current maximum number of lock is 4110. • Can be modified in cvm.h • cvm_barrier(int id) • Perform a global barrier. • The id parameter is currently ignored.
Access shared data • The processes should lock the same ‘id’ when they access the shared data. • As the shared-memory, mutex is need to be ensure. Memory operation lock() unlock() Without this lock, The memory info can’t be renew lock() Lazy Release Consistency
Cont. • Using barrier to exchange all info among machines. P[0:9]=1 Barrier() P[10:19]=2 Barrier() All shared data are synchronized. P[20:29]=3 Barrier() P[30:39]=4 Barrier()
synchronization • Wait & signal • cvm_signal_pause(), cvm_signal(int pid) • The signal can be buffered. (only one) • The order doesn’t matter. signal() signal() signal() buffered.. buffered.. signal_pause() signal_pause() signal_pause() Works fine! Blocks at the second pause
CVM arguments • the command line • $ ./cvmprog <opt> -- <CVM args> -- <prot args> • -d : turn on the debugging output • -n<num> : specify the # of procs • -P<num> : specify the size of pages <8192> • -t<num> : use per-node multithreading • hide communication latency. • -X<num> : specify the protocol
Consistency protocol • Default is lazy multi-writer (0) • Allowing multiple writer to simultaneously access the same page without communication • Using diff • Lazy single-writer (1) • Only a single writer can access the page at a time. (false sharing) • Sequentially consistent single-writer (2) • Every write will invoke invalidation. (lots of comm.)
Home-based RC • Home-based multi-writer (3) • Sometimes, the LRC still needs to send lots of diffs. Lock() unlock() diffs Lock() unlock() Two sets of diffs Lock() unlock()
Cont. • Every page has its own home(-node), which take care of it. • All diffs are sent to the home. Lock() unlock() Lock() unlock() diffs diffs Home-node Diffs or whole page Lock() unlock()
Example code #include “cvm.h” #include<stdio.h> #define DATA_SZ 1000 int *data,*psum,*gidx; void worker() { int lidx; psum[cvm_proc_id] = 0; do { cvm_lock(0); lidx=*gidx++; cvm_unlock(0); if( lidx > DATA_SZ) break; psum[cvm_proc_id]+=data[lidx]; }while(1); cvm_barrier(0); // the psum need to be synchronized }
int main(int argc, char *argv[]) { int sum, i; cvm_startup(argc,argv); // allocation of shared data gidx = cvm_alloc(sizeof(int)); data = cvm_alloc(sizeof(int)*DATA_SZ); psum = cvm_alloc(sizeof(int)*cvm_num_procs); // data initialization for(i=0;i<DATA_SZ;i++) data[i] = i+1; cvm_create_procs(worker); worker(); for(sum=0,i=0;i<cvm_num_procs;i++) sum += psum[i]; printf(“The summation from 1 to %d is %d\n”, DATA_SZ,sum); cvm_finish(); }
Without contention #include “cvm.h” #include<stdio.h> #define DATA_SZ 1000 int *psum, *data; void worker() { int i; psum[PID] = 0; // PID is the same as cvm_proc_id for(i=PID;i<DATA_SZ;i+=cvm_num_procs) psum[PID] += data[i]; cvm_barrier(0); // still for psum }
int main(int argc, char *argv[]) { int sum,i; cvm_startup(argc,argv); // allocation of shared data psum = cvm_alloc(sizeof(int)*cvm_num_procs); data = cvm_alloc(sizeof(int)*DATA_SZ); // data initialization for(i=0;i<DATA_SZ;i++) data[i] = i+1; cvm_create_procs(worker); worker(); for(sum=0, i=0;i<cvm_num_procs;i++) sum += psum[i]; printf(“The summation from 1 to %d is %d\n”, DATA_SZ,sum); cvm_finish(); }
cvm_reduce • cvm_reduce(void *global, void *local, int rtype, int dtype, int num) • Similar to MPI_Reduce • Four operations are provided. • min, max, sum, product • E.g. cvm_reduce(sum, psum, REDUCE_sum, REDUCE_int, 1); • Need #include ”reduce.h”