1 / 10

Operator Formulation and Parallelism in Algorithms

This paper explores the importance of operator formulation and data-centric abstraction in enabling parallel programming approaches for irregular algorithms. It introduces the concept of amorphous data-parallelism (ADP) and highlights the inadequacy of static dependence graphs in representing parallelism efficiently. Key insights and methods for exploiting ADP are discussed, emphasizing the need for new foundations in parallel programming. The study also delves into specific examples such as FFT and Delaunay mesh refinement to illustrate these concepts. The paper provides valuable insights for programmers and researchers seeking to optimize parallel algorithm design.

kristy
Download Presentation

Operator Formulation and Parallelism in Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The of Parallelism in Algorithms Keshav Pingali The University of Texas at Austin Joint work with D.Nguyen, M.Kulkarni, M.Burtscher, A.Hassaan, R.Kaleem, T-H. Lee, A.Lenharth, R.Manevich, M.Mendez-Lojo,D.Prountzos,X.Sui

  2. Message of paper • Parallel programming needs new foundations • dependence graphs are inadequate • cannot represent parallelism in irregular algorithms • New foundation: • operator formulation • data-centric abstraction • regular algorithms become special case • Key insights • amorphous data-parallelism (ADP) is ubiquitous • TAO analysis: structure in algorithms • use TAO structure to exploit ADP efficiently Dependence graph for FFT

  3. Inadequacy of static dependence graphs • Delaunay mesh refinement • Don’t-care non-determinism • final mesh depends on order in which bad triangles are processed • Data structure: graph • nodes: triangles • edges: triangle adjacencies • Parallelism • triangles with disjoint cavities can be processed in parallel • parallelism depends on runtime values • static dependence graph cannot be generated • parallelization must be done at runtime

  4. Operator formulation of algorithms • Algorithm formulated in data-centric terms • active element: • node or edge where computation is needed • activity: • application of operator to active element • neighborhood: • set of nodes and edges read/written by activity • ordering: • order of execution of active elements in a sequential implementation • any order • problem-dependent order • Amorphous data-parallelism (ADP) • process active nodes in parallel, subject to neighborhood and ordering constraints • how do we exploit ADP? : active node : neighborhood

  5. TAO analysis:structure in algorithms : active node : neighborhood Cf: Parallel programming patterns (Snir,Intel), Berkeley motifs (Patterson)

  6. Parallelization When can you produce a parallel schedule for program? Structured topology, topology-driven algorithms (dense linear algebra,FFT,finite-differences,..) Compile-time Static parallelization 3 After input is given but before execution Inspector-executor 2 4 During program execution Interference graph 1 After program is finished Optimistic parallelization Data-driven, ordered algorithms (discrete-event simulation, Dijkstra SSSP,..)

  7. Galois system • Programming model: • Algorithms • sequential, OO language (Joe) • Java/C++ with Galois set iterators • Concurrent data structure library • expert programmers (Stephanie) • Execution model: • optimistic and static parallelization • Galois system (Java): • http://iss.ices.utexas.edu/galois

  8. Performance DMR: 500K triangles Barnes-Hut Machine: 4x6-core Intel Xeon X7540

  9. Andersen-style points-to analysis • Structural analysis • topology: general graph • operator: morph • ordering: unordered • Optimizations • cautious operator • lock optimization • Comparison • Hardekopf & Lin (PLDI 2007) • red lines in graphs • Mendez-Lojo et al (OOPSLA 2010) Threads Threads Intel 8-core Xeon

  10. Summary

More Related