120 likes | 243 Views
CRL (C Region Library). Chao Huang, James Brodman, Hassan Jafri CS498LVK. Introduction. CRL is an all-software distributed shared memory (DSM) system Provides shared address space Built on PVM “Region”: an arbitrarily sized, continuous area of memory Consistent cached copy at local nodes.
E N D
CRL (C Region Library) Chao Huang, James Brodman, Hassan Jafri CS498LVK
Introduction • CRL is an all-software distributed shared memory (DSM) system • Provides shared address space • Built on PVM • “Region”: an arbitrarily sized, continuous area of memory • Consistent cached copy at local nodes
Functions • Environment • crl_init • crl_num_nodes, crl_self_addr • Basic region operations • rid_t rgn_create(unsigned size) • void rgn_destroy(rid_t rgn_id) • rid_t rgn_rid(void *rgn) • unsigned rgn_size(void *rgn) • void rgn_flush(void* rgn)
Functions • Region mapping • void* rgn_map(rid_t rgn_id) • void rgn_unmap(void* rgn) • Region read and write • void rgn_start_read(void *rgn) • void rgn_end_read(void *rgn) • void rgn_start_write(void *rgn) • void rgn_end_write(void *rgn)
Functions • Global synchronization • void rgn_barrier(void) • void rgn_bcast_send(int len, void *buf) • void rgn_bcast_recv(int len, void *buf) • double rgn_reduce_dadd(double arg) • double rgn_reduce_dmin(double arg) • double rgn_reduce_dmax(double arg)
Example /* Compute the dot product of * two n-element vectors, each * of which is represented by * appropriately-sized region * x: region identifier for 1st vector * y: address at which 2nd vector is already mapped */ double dotprod(rid_t x, double *y, int n) { int i; double *z; double rslt; /* map 1st vector and initiate read operation */ z = (double *) rgn_map(x); rgn_start_read(z); /* initiate read operation on 2nd vector */ rgn_start_read(y); /* compute dot product */ rslt = 0; for (i=0; i<n; i++) rslt += z[i] * y[i]; /* terminate read operations and unmap 1st vector */ rgn_end_read(y); rgn_end_read(z); rgn_unmap(z); return rslt; }
Discussions • All-software: latency of communication operations may be higher than hardware based system • Region size can be chosen to correspond to user data structures (programmer’s responsibility) • Fixed-home, directory-based invalidate protocol • Ordered message delivery: 32-bit version number tags each region • Unmapped region cache : unique mapping can be cached after unmapped
URC • Enables Lazy Release Consistency for CRL • rgn_start_op can be satisfied locally if region is not invalidated before next time it is mapped • Even if data/region is invalidated, later accesses can be satisfied more quickly
Software • Prototype implementation available • Platforms • CM-5 Thinking Machines (message passing multicomputer) • Alewife (Distributed memory multiprocessor). Provides Native shared memory support • TCP/Unix Implementation for SunOS • Expect a Linux port soon
Applications • 32-way completion time of apps with CRL on Alewife comparable to that of Alewife native shared memory • How? Upto 5 remote headers supported by LimitLESS (Alewife’s software-based cache-coherence subsystem)