1 / 18

High Productivity Languages for Parallel Programming Compared to MPI

High Productivity Languages for Parallel Programming Compared to MPI. Scott Spetka – SUNYIT and ITT Corp Haris Hadzimujic – SUNY Institute of Technology Stephen Peek – Binghamton University Christopher Flynn – Air Force Research Laboratory, Information Directorate.

aida
Download Presentation

High Productivity Languages for Parallel Programming Compared to MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Productivity Languages for Parallel Programming Compared to MPI Scott Spetka – SUNYIT and ITT Corp Haris Hadzimujic – SUNY Institute of Technology Stephen Peek – Binghamton University Christopher Flynn – Air Force Research Laboratory, Information Directorate HPC Users Group Conference Seattle, WA July 15 – 17, 2008

  2. Introduction to Chapel, X10, MPI • Pub/Sub Case Study • Language Examples • Conclusion Outline

  3. DoD HPCS • Improved Programmer Productivity • Last Chapel Release March 2008 version 0.775 – remote processing support http://chapel.cs.washington.edu/ • New X10 language report came out version 1.7 – June 18, 2008

  4. Data Distribution - Global partitioned address Space • Communication Model – One-sided/two-sided • Synchronization – Sync variables (Chapel) Clocks (X10) Atomic Sections (Both) • Parallel Threads – Async, Futures (X10), Cobegin (Chapel) • Performance – Prototypes demonstrate features - 2010 Introduction to Chapel, X10, MPI

  5. 1 var n : int = 1000; 2 var A, B: [ 1 . . n ] float ; 3 forall i in 2 . . n−1 4 B( i ) = (A( i − 1) + A( i + 1 ) ) / 2 ; 1 var n : int = 1000; 2 var locN : int = n / numTasks ; 3 var A, B: [ 0 . . locN +1] float ; 4 var myItLo : int = 1 ; 5 var myItHi : int = locN ; 6 if ( iHaveLeftNeighbor ) then 7 send ( left , A( 1 ) ) ; 8 else 9 myItLo = 2 ; 10 if ( iHaveRightNeighbor ) { 11 send ( right , A( locN ) ) ; 12 recv ( right , A( locN + 1 ) ) ; 13 }e l s e 14 myItHi = locN−1; 15 if ( iHaveLeftNeighbor ) then 16 recv ( left , A( 0 ) ) ; 17 forall i in myItLo . . myItHi do 18 B( i ) = (A( i −1) + A( i +1 ) ) / 2 ; PGAS vs. Fragmented International Journal of High Performance Computing Applications, August 2007 B.L. Chamberlain, Cray D. Callahan, Microsoft H.P. Zima  JPL, U of Vienna, Austria

  6. Global vs Local View International Journal of High Performance Computing Applications, August 2007 B.L. Chamberlain, Cray D. Callahan, Microsoft H.P. Zima  JPL, U of Vienna, Austria

  7. Pub/Sub Model Pub/Sub Introduction Publisher - Publish XML documents Pubcatcher – Publication input to brokers Subscriber – Submit XPATH subscriptions Broker – Match subscriptions against pubs

  8. Pub/Sub Model Pub/Sub Model - PGAS

  9. Pub/Sub Model Pub/Sub Model - Fragmented

  10. Chapel type elemType = int(32); config const numPublishers = 2, numBrokers = 2, bufferSize=12; const ProblemSpace: domain(1) distributed(Cyclic) = [0..bufferSize-1]; var buff: [ProblemSpace] elemType; var nextFreeSlot$: sync int = 1; var nextFullSlot$: sync int = 1; def main() { cobegin { coforall i in 1..numPublishers { publisher(i); } coforall i in 1..numBrokers { broker(i); } } }

  11. Chapel def publisher(id: int) { var pub = infile.read(int); for slot in getNextFreeSlot() { writeln("Publisher:", id, " published:", pub, " in slot:",slot); buff(slot) = pub; sleep(3); pub = infile.read(int); } }

  12. Chapel def getNextFreeSlot() { // Access the next free message queue slot while (1) { const locFree = nextFreeSlot$; // consume sync var const nextFree = (locFree + 1) % bufferSize; if (nextFree == nextFullSlot$.readXX()) { // we wrapped around so don't yield anything, but allow others to // continue by refilling the sync var with the same value nextFreeSlot$ = locFree; } else { nextFreeSlot$ = nextFree; // refill sync var with advanced value yield locFree; // yield the free slot that we grabbed } } }

  13. X10 // Declaration of global one dimensional array that will be distributed // Cyclic distribution definition using region of A for distribution scope final static int [.] A = new int [[1:8]] (point[i]) { return i*10; }; final static dist d = dist.factory.cyclic(A.region); public static void main(String args[]) { System.out.println("\n\nTotal places: "+ place.MAX_PLACES + "\n"); System.out.println( "ID of the distribution: " + here + "\n"); finish ateach (final point p: d ) { System.out.println( "Execution place: "+ d[p] + " and value: " + A[p]); } subscription(1); subscription(2); } // end main

  14. static void subscription(final int i) { foreach(point p : d) { async (d.distribution[p]) { switch (i) { case 1: if(A[p]>40) { A[p]=A[p]+1; System.out.println(“Location " + here + " value" + A[p]); } case 2: if(A[p]<40) { A[p]=A[p]-1; System.out.println(“Location " + here + " “value" + A[p]); } default: break; } // switch } // async } // foreach } // subscription X10

  15. MPI //get attribute to determine if current process is to store data MPI_Attr_get(next_comm, NEXT, &next_store_ptr, &flag); MPI_Allreduce(next_store_ptr, &next_rank, 1, MPI_INT, MPI_MAX, next_comm); next_rank = next_rank % size;

  16. MPI if (my_rank == next_rank){ MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status); if ((next_rank+1)<size){ *next_ptr = next_rank + 1; } else{ *next_ptr = next_rank + 2; } MPI_Attr_put(next_comm, NEXT, next_ptr); printf("stored on process %i\n", next_rank); MPI_Recv(&data_recv, 1, MPI_INT, status.MPI_SOURCE, status.MPI_TAG, MPI_COMM_WORLD, &status); data_store[count][0] = data_recv; data_store[count][1] = status.MPI_TAG; count++; }

  17. Conclusion HPCS languages reduce time to solution Object Oriented – user-defined distributions, reductions, scans Global Synchronization One-sided communication Adding new tasks

  18. Acknowledgements Bradford Chamberlain, Cray Igor Peshansky, IBM

More Related