1 / 24

Pattern Parallel Programming

Pattern Parallel Programming. B. Wilkinson PatternProgIntro.ppt Modification date: Feb 21, 2016. 1. Traditional programming approach. Explicitly specify message-passing (MPI ) Low-level threads APIs ( Pthreads , Java threads, OpenMP , …).

Download Presentation

Pattern Parallel Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Parallel Programming B. Wilkinson PatternProgIntro.ppt Modification date: Feb 21, 2016 1

  2. Traditional programming approach • Explicitly specify message-passing (MPI) • Low-level threads APIs (Pthreads, Java threads, OpenMP, …). Both require programmers to use low-level routines Need to make parallel programming easier, more structured and more scalable, especially in an educational environment 2

  3. Pattern Programming Concept Programmer begins by constructing his program using established computational or algorithmic “patterns” that provide a structure. Design patterns - part of software engineering for many years: • Reusable solutions to commonly occurring problems * • Patterns provide guide to “best practices”, not a final implementation • Provides good scalable design structure • Avoids common problem with ad-hoc designs • Can reason more easily about programs and debug * http://en.wikipedia.org/wiki/Design_pattern_(computer_science)

  4. Parallel Patterns -- Advantages • Abstracts/hides underlying computing environment • Generally avoids deadlocks and race conditions • Reduces source code size (lines of code) • Leads to automated conversion into parallel programs without need to write with low level MPI message-passing routines. • Hierarchical designs with patterns embedded into patterns, and pattern operators to combine patterns. Disadvantages • New approach to learn • Takes away some of the freedom from programmer • Performance reduced (c.f. using high level languages instead of assembly language)

  5. What parallel design patterns are we talking about? Higher level patterns for forming a complete computation: • master-slave • workpool, • pipeline • divide and conquer • stencil • map-reduce, ... • Low-level patterns: • fork-join • point-to point • broadcast • scatter • gather, reduce, ...

  6. Low level MPI message-passing patterns MPI point-to-point Data Transfer (Send-Receive) Destination Source Data

  7. Collective patterns Broadcast Pattern Sends same data to each of a group of processes. A common pattern to get same data to all processes, especially at beginning of a computation Destinations Same data sent to all destinations Source Note: Patterns given do not mean the implementation does them as shown. Only the final result is the same in any parallel implementation. Patterns do not describe the implementation.

  8. Scatter Pattern Distributes a collection of data items to a group of processes. A common pattern to get data to all processes. Usually data sent are parts of an array Destinations Different data sent to each destinations Source

  9. Gather pattern Sources Destination Essentially reverse of scatter pattern. It receives data items from a group of processes Data Data Data collected at destination in an array Data Common pattern especially at the end of a computation to collect results.

  10. Reduce Pattern Sources A common pattern to get data back to master from all processes and then aggregate it by combining collected data into one answer. Destination Data Reduction operation Data Data collected at destination and combined to get one answer with a commutative operation Reduction needs to be associative operation (e.g. 3 + (4 + 5) = (3 + 4) + 5) to allow the implementation to do the operations in any order. Also being communicative (e.g. 3 + 4 = 4 + 3) allows more flexibility in the parallel implementation. Data Note subtraction is not associative e.g. 3 – (4 – 5) != (3 – 4) – 5 but one can use addition with negative numbers

  11. Collective all-to-all broadcast Sources and destinations are the same processes Destinations Sources A common all-to-all pattern, often within a computation, is to send data from all processes to all processes Every process sends data to every other process (one-way) Versions of this can be found in MPI.

  12. Some Higher Level Message-Passing Patterns Slaves Master/slave Master Two-way connection Computation divided into parts, which are then passed out to slaves to perform and return their results, basis of most parallel computing Compute node Source/sink

  13. Workpool Slaves/Workers Another task if task queue not empty Very widely applicable pattern Result Task from task queue Aggregate answers Task queue Once a slave completes a task, slave given another task from task queue master -- load-balancing quality. Need to differentiate between master-slave pattern, which does not imply a task queue, and workpool with task queue. Master

  14. More Specialized High-level Patterns Pipeline Stage 1 Stage 2 Stage 3 Stage n Slaves (workers) One-way connection Master Compute node Source/sink

  15. Divide and Conquer Two-way connection Divide Merge Compute node Source/sink

  16. All-to-All All compute nodes can communicate with all the other nodes Two-way connection Compute node Source/sink Master

  17. Stencil All compute nodes can communicate with only neighboring nodes Usually a synchronous computation - Performs number of iterations to converge on solution, e.g. solving Laplace’s/heat equation On each iteration, each node communicates with neighbors to get stored computed values Two-way connection Compute node Source/sink

  18. Iterative synchronous patterns • When a pattern is repeated until some termination condition occurs. • Synchronization at each iteration, to establish termination condition, often a global condition. • Note this is two patterns merged together sequentially if we call iteration a pattern. Pattern Check termination condition Repeat Stop

  19. Iterative synchronous stencil pattern Stencil: All compute nodes can communicate with only neighboring nodes • Applications: • Solving Laplace’s/heat equation - perform number of iterations to converge on solution. Repeat Check termination condition Stop 19

  20. Iterative synchronous all-to-all pattern Repeat Check termination condition Stop Example: N-body problem needs an “iterative synchronous all-to-all” pattern, where on each iteration all processes exchange data with each other. 20

  21. Previous/Existing Work Patterns explored in several projects. • Industrial efforts • Intel Threading Building Blocks (TBB), Cilk plus, Array Building Blocks (ArBB). Focus on very low level patterns such as fork-join • Universities: • University of Illinois at Urbana-Champaign and University of California, Berkeley • University of Torino/Università di Pisa Italy “Structured Parallel Programming: Patterns for Efficient Computation,” Michael McCool, James Reinders, Arch Robison, Morgan Kaufmann, 2012 Intel tools, TBB, Cilk, ArBB

  22. Our approach We have developed several tools at different levels of abstraction that avoid using low level MPI and enable students to create working patterns very quickly. • Suzaku framework – provides pre-written pattern-based routines and macros that hide the MPI code. Low level patterns, workpool, ... . • Paraguin compiler – Compiler directive approach that creates MPI code. Patterns implemented include scatter-gather for a master slave pattern, stencil, … • Seeds framework – high-level Java-based software. Many patterns implemented including workpool, pipeline, synchronous iterative all-to-all, stencil. Self deploys and executes on any platform, local computers or distributed computers Historical Seeds was developed first as part of a UNC-C PhD project by Jeremy Villalobos, 2007-2011.

  23. Acknowledgements The Seeds framework was developed by Jeremy Villalobos in his PhD thesis “Running Parallel Applications on a Heterogeneous Environment with Accessible Development Practices and Automatic Scalability,” UNC-Charlotte, 2011. Extending work to teaching environment supported by the National Science Foundation under grant "Collaborative Research: Teaching Multicore and Many-Core Programming at a Higher Level of Abstraction" #1141005/1141006 (2012-2015). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

  24. Questions

More Related