1 / 25

CISC 372 5 October

CISC 372 5 October. Goals for today: Foster’s parallel algorithm design Partitioning Task dependency graph Granularity Concurrency Collective communication. Task. Channel. Task/Channel Model. Parallel Algorithm Design. Large problem May be complex May have large data set

cfulp
Download Presentation

CISC 372 5 October

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CISC 372 5 October Goals for today: • Foster’s parallel algorithm design • Partitioning • Task dependency graph • Granularity • Concurrency • Collective communication

  2. Task Channel Task/Channel Model

  3. Parallel Algorithm Design • Large problem • May be complex • May have large data set • May be “embarrassingly parallel” • How “solve” problem? (Foster’s Methodology) • Partition problem & data • Determine communication requirements • Agglomerate tasks into more efficient size • Map tasks to processors

  4. Foster’s Methodology

  5. Partitioning • Dividing computation and data into pieces • Domain decomposition • Divide data into pieces • Determine how to associate computations with the data • Functional decomposition • Divide computation into pieces • Determine how to associate data with the computations

  6. Partitioning Checklist • At least 10x more primitive tasks than processors in target computer • Minimize redundant computations and redundant data storage • Primitive tasks roughly the same size • Number of tasks an increasing function of problem size

  7. Foster’s Methodology

  8. Communication • Determine values passed among tasks • Local communication • Task needs values from a small number of other tasks • Create channels illustrating data flow • Global communication • Significant number of tasks contribute data to perform a computation • Don’t create channels for them early in design

  9. Communication Checklist • Communication operations balanced among tasks • Each task communicates with only small group of neighbors • Tasks can perform communications concurrently • Task can perform computations concurrently

  10. Foster’s Methodology

  11. Agglomeration • Grouping tasks into larger tasks • Goals • Improve performance • Maintain scalability of program • Simplify programming • In MPI programming, goal often to create one agglomerated task per processor

  12. Agglomeration to Improve Performance • Eliminate communication between primitive tasks agglomerated into consolidated task • Combine groups of sending and receiving tasks

  13. Agglomeration Checklist • Locality of parallel algorithm has increased • Replicated computations take less time than communications they replace • Data replication doesn’t affect scalability • Agglomerated tasks have similar computational and communications costs • Number of tasks increases with problem size • Number of tasks suitable for likely target systems • Tradeoff between agglomeration and code modifications costs is reasonable

  14. Foster’s Methodology

  15. Mapping • Process of assigning tasks to processors • Centralized multiprocessor: mapping done by operating system • Distributed memory system: mapping done by user • Conflicting goals of mapping • Maximize processor utilization • Minimize interprocessor communication

  16. Mapping Example

  17. Optimal Mapping • Finding optimal mapping is NP-hard • Must rely on heuristics

  18. Mapping Checklist • Considered designs based on one task per processor and multiple tasks per processor • Evaluated static and dynamic task allocation • If dynamic task allocation chosen, task allocator is not a bottleneck to performance • If static task allocation chosen, ratio of tasks to processors is at least 10:1

  19. Complex Mesh Decomposition Mesh Cube With Hollow Sphere inside Cross-section of Cube • Finite number of tetrahedra • Each tetrahedron varies in size

  20. Tetra-he-who? • Tetrahedron: A solid having four triangular faces Maybe not this. This is a tetrahedron.

  21. Static Task Number,Unstructured Comm, Mesh Cube Partitioned with METIS

  22. Edge Detection • Finite number of pixels • All pixels same size • All pixel values constrained (0-255) • Stencil computation

  23. Electromagnetic Fields • Finite number of points • Laplacian equation (Jacobi method - iterate until converge on solution) • Convergence time varies at each point • Rough surface sphere • Radiation source at center • Measure strength on surface

  24. Parallel Tree Search • Unbalanced tree • Nodes may change from active to inactive at any time • Searching for specific object in set of all active nodes

  25. Online Sort • Receive list in separate pieces at different times (constant update) • Keep list sorted at all times

More Related