180 likes | 288 Views
High Productivity Languages for Parallel Programming Compared to MPI. Scott Spetka – SUNYIT and ITT Corp Haris Hadzimujic – SUNY Institute of Technology Stephen Peek – Binghamton University Christopher Flynn – Air Force Research Laboratory, Information Directorate.
E N D
High Productivity Languages for Parallel Programming Compared to MPI Scott Spetka – SUNYIT and ITT Corp Haris Hadzimujic – SUNY Institute of Technology Stephen Peek – Binghamton University Christopher Flynn – Air Force Research Laboratory, Information Directorate HPC Users Group Conference Seattle, WA July 15 – 17, 2008
Introduction to Chapel, X10, MPI • Pub/Sub Case Study • Language Examples • Conclusion Outline
DoD HPCS • Improved Programmer Productivity • Last Chapel Release March 2008 version 0.775 – remote processing support http://chapel.cs.washington.edu/ • New X10 language report came out version 1.7 – June 18, 2008
Data Distribution - Global partitioned address Space • Communication Model – One-sided/two-sided • Synchronization – Sync variables (Chapel) Clocks (X10) Atomic Sections (Both) • Parallel Threads – Async, Futures (X10), Cobegin (Chapel) • Performance – Prototypes demonstrate features - 2010 Introduction to Chapel, X10, MPI
1 var n : int = 1000; 2 var A, B: [ 1 . . n ] float ; 3 forall i in 2 . . n−1 4 B( i ) = (A( i − 1) + A( i + 1 ) ) / 2 ; 1 var n : int = 1000; 2 var locN : int = n / numTasks ; 3 var A, B: [ 0 . . locN +1] float ; 4 var myItLo : int = 1 ; 5 var myItHi : int = locN ; 6 if ( iHaveLeftNeighbor ) then 7 send ( left , A( 1 ) ) ; 8 else 9 myItLo = 2 ; 10 if ( iHaveRightNeighbor ) { 11 send ( right , A( locN ) ) ; 12 recv ( right , A( locN + 1 ) ) ; 13 }e l s e 14 myItHi = locN−1; 15 if ( iHaveLeftNeighbor ) then 16 recv ( left , A( 0 ) ) ; 17 forall i in myItLo . . myItHi do 18 B( i ) = (A( i −1) + A( i +1 ) ) / 2 ; PGAS vs. Fragmented International Journal of High Performance Computing Applications, August 2007 B.L. Chamberlain, Cray D. Callahan, Microsoft H.P. Zima JPL, U of Vienna, Austria
Global vs Local View International Journal of High Performance Computing Applications, August 2007 B.L. Chamberlain, Cray D. Callahan, Microsoft H.P. Zima JPL, U of Vienna, Austria
Pub/Sub Model Pub/Sub Introduction Publisher - Publish XML documents Pubcatcher – Publication input to brokers Subscriber – Submit XPATH subscriptions Broker – Match subscriptions against pubs
Pub/Sub Model Pub/Sub Model - PGAS
Pub/Sub Model Pub/Sub Model - Fragmented
Chapel type elemType = int(32); config const numPublishers = 2, numBrokers = 2, bufferSize=12; const ProblemSpace: domain(1) distributed(Cyclic) = [0..bufferSize-1]; var buff: [ProblemSpace] elemType; var nextFreeSlot$: sync int = 1; var nextFullSlot$: sync int = 1; def main() { cobegin { coforall i in 1..numPublishers { publisher(i); } coforall i in 1..numBrokers { broker(i); } } }
Chapel def publisher(id: int) { var pub = infile.read(int); for slot in getNextFreeSlot() { writeln("Publisher:", id, " published:", pub, " in slot:",slot); buff(slot) = pub; sleep(3); pub = infile.read(int); } }
Chapel def getNextFreeSlot() { // Access the next free message queue slot while (1) { const locFree = nextFreeSlot$; // consume sync var const nextFree = (locFree + 1) % bufferSize; if (nextFree == nextFullSlot$.readXX()) { // we wrapped around so don't yield anything, but allow others to // continue by refilling the sync var with the same value nextFreeSlot$ = locFree; } else { nextFreeSlot$ = nextFree; // refill sync var with advanced value yield locFree; // yield the free slot that we grabbed } } }
X10 // Declaration of global one dimensional array that will be distributed // Cyclic distribution definition using region of A for distribution scope final static int [.] A = new int [[1:8]] (point[i]) { return i*10; }; final static dist d = dist.factory.cyclic(A.region); public static void main(String args[]) { System.out.println("\n\nTotal places: "+ place.MAX_PLACES + "\n"); System.out.println( "ID of the distribution: " + here + "\n"); finish ateach (final point p: d ) { System.out.println( "Execution place: "+ d[p] + " and value: " + A[p]); } subscription(1); subscription(2); } // end main
static void subscription(final int i) { foreach(point p : d) { async (d.distribution[p]) { switch (i) { case 1: if(A[p]>40) { A[p]=A[p]+1; System.out.println(“Location " + here + " value" + A[p]); } case 2: if(A[p]<40) { A[p]=A[p]-1; System.out.println(“Location " + here + " “value" + A[p]); } default: break; } // switch } // async } // foreach } // subscription X10
MPI //get attribute to determine if current process is to store data MPI_Attr_get(next_comm, NEXT, &next_store_ptr, &flag); MPI_Allreduce(next_store_ptr, &next_rank, 1, MPI_INT, MPI_MAX, next_comm); next_rank = next_rank % size;
MPI if (my_rank == next_rank){ MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status); if ((next_rank+1)<size){ *next_ptr = next_rank + 1; } else{ *next_ptr = next_rank + 2; } MPI_Attr_put(next_comm, NEXT, next_ptr); printf("stored on process %i\n", next_rank); MPI_Recv(&data_recv, 1, MPI_INT, status.MPI_SOURCE, status.MPI_TAG, MPI_COMM_WORLD, &status); data_store[count][0] = data_recv; data_store[count][1] = status.MPI_TAG; count++; }
Conclusion HPCS languages reduce time to solution Object Oriented – user-defined distributions, reductions, scans Global Synchronization One-sided communication Adding new tasks
Acknowledgements Bradford Chamberlain, Cray Igor Peshansky, IBM