880 likes | 1.04k Views
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A. Delaunay mesh refinement. Triangulate a given set of points. Delaunay property: No point is contained within the circumcircle of a triangle. Quality property:
E N D
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA
Delaunay mesh refinement • Triangulate a given set of points. • Delaunay property: No point is contained within the circumcircle of a triangle. • Quality property: No bad triangles—i.e., triangles with an angle > 120o. • Mesh refinement: Fix bad triangles through an iterative algorithm.
Retriangulation Cavity
Sequential mesh refinement Mesh m = /* read input mesh */ Worklist wl = new Worklist(m.getBad()); foreach triangle t in wl { Cavity c = new Cavity(t); c.expand(); c.retriangulate(); m.updateMesh(c); wl.add(c.getBad()); } • Cavities are contiguous.
Parallelization • Typical shared-memory implementation: • Mesh = Heap-allocated graph data structure • Nodes = triangles. Edges = adjacency • Atomicity: overlapping cavities must not be processed at the same time. • Irregular data-parallelism:Extent of parallelism depends on input. • Non-overlapping cavities processed in parallel. • In worst case, no parallelism. • In typical case, cavities are mostlynon-overlapping. • Lot of recent work, notably by Pingali et al.
Social networks in epidemics* Adam Beth • Agents = parties among whom anepidemic may be spreading • Each node models an agent at a physical location (e.g., school). • At a given location, an agent interacts with the same set of other agents. • Edge = potential of interaction. • Assumption: degree of a node < 9 • Atomicity: overlapping social interactions not processed simultaneously. Chitra Eve David (*) Burke et al. Individual-based computational modeling of smallpox epidemic control strategies. Academic Emergency Medicine, 13(11):1142-1149, 2006.
Irregular parallelism • Heap-allocated data structures like lists, trees, and graphs. • Almost impossible to parallelize statically. (Shape analysis does not work.) • Needed: Dynamic methods. • Needed: Programming abstractions to express and exploit whatever parallelism is permitted by the problem instance.
List of irregular applications (lifted from Pingali et al.) • Delaunay mesh refinement, Delaunay triangulation • Agglomerative clustering, ray tracing • Social network maintenance • Minimum spanning tree, Maximum flow • N-body simulation, epidemiological simulation • Sparse matrix-vector multiplication, sparse Cholesky factorization • Belief propagation, survey propagation in Bayesian inference • Iterative dataflow analysis, Petri net simulation • Finite-difference PDE solution
The Lonestar challenge • Lonestar benchmarks: joint project of UT Austin (Pingali’s group) and IBM. • Four widely used, large, irregularly data-parallel applications: • Delaunay mesh refinement • Delaunay triangulation • Focused community discovery in social network[K. Hildrum and P. Yu. Focused Community Discovery. IEEE Conference on Data Mining, 2005.] • Barnes-Hut N-body simulation.[J. Barnes and P. Hut. A hierarchical O(N log N) force-calculation algorithm. Nature, 324(4):446-449, 1986.] Parallelize this!
Roadmap • Locality of effects • The Sociable Objects model • The Sirius language • Case studies • Implementation and evaluation • Related and future work
Delaunay mesh refinement Mesh m = /* read input mesh */ Worklist wl = new Worklist(m.getBad()); foreach triangle t in wl { Cavity c = new Cavity(t); c.expand(); c.retriangulate(); m.updateMesh(c); wl.add(c.getBad()); } • Cavity = Contiguous region in the mesh. • Pattern: “Own a contiguous region, update it, release the region.”
Effects of updates are local Cavity On a mesh of ~100,000 triangles from Lonestar benchmarks (about half of them bad): Average cavity size = 3.75 triangles Maximum cavity size = 12 triangles Locality of effects the essence of parallelism.
Social networks in epidemics Adam Beth • Node = agent at a physical location (e.g., school). • Edge = potential of interaction. • Interactions are local. • Effects of updates restricted to neighborhoods. Chitra Eve David Ganesh
Locality and current approaches • Threads + explicit locking: • Heap abstraction is global. • Threads can follow pointers anywhere unless explicitly forbidden. • Low-level and error-prone. • Monitors: • Abstraction for atomic updates on individual objects. • Missing: Atomic updates on collectives of objects (region in the heap). • Heap abstraction is global.
Current approaches (contd.) • Transactions: • Heap abstraction is global. • Burden of reasoning passed to transaction manager. • In most implementations, conflicts detected by monitoring memory reads and writes. Therefore either conservative or expensive. [Pingali et al 2007, 2008] • To come up later: Galois system, PGAS languages like X10.
Current approaches (contd.) • Transactions: • Heap abstraction is global. • Burden of reasoning passed to transaction manager. • In most implementations, conflicts detected by monitoring memory reads and writes. Therefore either conservative or expensive. [Pingali et al 2007, 2008] • To come up later: Galois system, PGAS languages like X10. Our goal: capture locality of effects
Roadmap • Locality of effects • The Sociable Objects model • The Sirius language • Case studies • Implementation and evaluation • Related and future work
Design ideas • Treat “neighborhoods” in the heap as first-class citizens. • Nighborhood = contiguous region in heap + sequential thread. Objects outside are invisible. • Primitives to declaratively, dynamically, and locally reconfigure neighborhoods. • Neighborhoods typically small, offering massive parallelism. No worst-case guarantees.
Heaps, regions, neighborhoods • Heap =connected directed graphNodes = objectsLabeled edges = pointers • Region = weakly connected partition • Neighborhood = Thread restricted to a region (Better seen as short-lived tasks.)
Neighborhood action: merging • Neighborhood merges with neighborhoodalong an edge • gets a bigger region • dies. • To prevent races, must not be “busy” while merge happens • Synchronization construct. Local coarsening of parallelism.
Neighborhood action: splitting • Neighborhood splits into neighborhoodsthrough • Other neighborhoods not affected. • Not a synchronization construct. Local refinement of parallelism.
Neighborhood action: local updates • Attempts to access objects outside region lead to exceptions. (Similar to out-of-place accesses in X10.) x = u.f;
Program • Collection of neighborhood class declarations. • Neighborhood class : set of local variables (including an initial variable) and an action • Variables point to local objects. • Action: set of guarded updates
Program • Collection of neighborhood class declarations. • Neighborhood class : set of local variables (including an initial variable) and an action • Action: set of guarded updates Semantics: • Top-level: Nondeterministically choose a guarded update 2. Atomically execute the guard 3. Execute the update • Back to Top-level
Actions • Guards do not modify the heap, but may merge neighborhoods. All synchronization happens in guards. • Update = imperative modification of regions. Also splitting. Due to isolation, no precaution to enforce atomicity of updates.
Merging Control at top-level Execute S Local –variable state stays the same.
Merging Control at top-level Variant: Become a neighborhood of class . Initial variable of gets the value of v.Sgets ignored. Execute S
Splitting • Neighborhoods to are of class • The initial variable of points to , etc. • Local state of destroyed. • Refinement at finest granularity.
Split one • Initial variable of points to . • If region gets disconnected, “main” child neighborhood is the one containing • Main child inherits the local state of parent. (Variables pointing outside region are de-initialized). • Other child-neighborhoods split into individual objects.
Other updates • Attempts to access objects outside region lead to exceptions. • If region gets disconnected, main child-neighborhood contains . • Other children split into objects. x = u.f; u.f = x;
Computations are unordered • No global ordering between merges and splits. • Guarantee: a neighborhood isn’t merged in the middle of an update. • Can be “killed” by a merge any time it is at the top-level.
Delaunay mesh refinement • Use two neighborhood classes: Triangle and Cavity. • Cavity = contiguous region in mesh. • Each triangle: • Determines if it is bad (local check). • If so, merges with neighbors to become cavity. • Each cavity: • Determines if it is complete (local check). • If no, merges with a neighbor. • If yes, retriangulates (locally) and splits.
Delaunay mesh refinement: sketch nhood Triangle:: ... action:: merge (v.f, Cavity, u) when bad?: skip nhood Cavity:: ... action:: merge (v.f) when (not complete?): skip complete?: retriangulate(); split(Triangle)
Delaunay mesh refinement: sketch nhood Triangle:: ... action:: merge (v.f, Cavity, u) when bad?: skip nhood Cavity:: ... action:: merge (v.f) when (not complete?): skip complete?: retriangulate(); split(Triangle) What happens on a conflict? • Cavity i “absorbed” by cavity j. • Cavity j now has some “unnecessary” triangles. • j will later split.
Granularity of parallelism • In the worst case, the whole heap merges into one neighborhood. • In typical case, we merge, but into small neighborhood. • Only as much parallelism as the input permits.
Data races • Updates only modify locally isolated objects. • can merge with only when is not in the middle of an update. • Therefore, no races.
Local enabling • has a locally enabled merge with when: • merge(u.f):Sin action set, or • merge(u.f) when g:S in action set and g is satisfied. • etc.
Deadlocks • Classic definition: Process P waits for a resource from Q and vice versa. • Deadlock in Sociable Objects: • has a locally enabled merge with • has a locally enabled merge with • No other progress is possible. • But one of the merges can always be carried out. (A neighborhood can always be killed at top-level.) • Theorem:Assuming updates terminate, no deadlock.
Responsiveness • Definition: If a merge is locally enabled in neighborhood , then eventually, one of the following happens: • The merge goes through. • is “killed” by a different neighborhood merging with it. • Enforcement requires refinement of semantics: • For each neighborhood , track in a queue all neighborhoods that have locally enabled merges with it. • Let i merge only when this queue is empty. • Otherwise, merges in the queue get precedence. • In message-passing lingo, order receives before sends. (Done by the runtime, not the programmer.) • Theorem: The refined semantics enforces responsiveness.
Connection with X10 • In X10: • Places = Static memory partitions • Activities located in places. • Many activities at one place. • In Sociable Objects: • Neighborhood = “Contiguous” place + activity • One activity at one neighborhood. • Local reconfiguration. • Dynamic creation.
Roadmap • Locality of effects • The Sociable Objects model • The Sirius language • Case studies • Implementation and evaluation • Related and future work
Sirius: embedding of Sociable Objects into Java • Neighborhood classes: • Constructor: Cavity (v1,v2) { … } • Merges and splits pass parameters: merge (u.f, u1, u2): S; split (Triangle,u1,u2); • Read-only data: • Data structures can be set to readonly. • Writeable data can be converted to readonly at any time. • Interleaved sequential and parallel phases. (In a sense, fork-join parallelism.)
Roadmap • Locality of effects • The Sociable Objects model • The Sirius language • Case studies • Implementation and evaluation • Related and future work
Delaunay mesh refinement 7: nhood Cavity { 8: action { // expand cavity 9: merge(outgoingedges, TriangleObject t): 10: { outgoingedges.remove(t); 11: frontier.add(t); 12: build(); } 13: } 14: Set members; Set border; 15: Queue frontier; // current frontier 16: List outgoingedges; // outgoing edges on which to merge 17: TriangleObject initial; ... 1: nhood Triangle { 2: Triangle(TriangleObject t) { 3: if (t.isBad()) 4: become(Cavity); // become a Cavity 5: } 6: } /* end Triangle */ 50: nhood Loader { 51: Loader(String filename) { 52: ... 53: ... new TriangleObject (p1, p2, p3); … 55: split(SingleTriangle); 56: } 57: } /* end Loader */
Delaunay mesh refinement (contd.) 21: void build() { 22: while (frontier.size() != 0) { 23: TriangleObjectcurr = frontier.dequeue(); 24: try { 25: if (isMember(curr)) members.add(curr); 26: else border.add(curr); // add triangles using BFS 27: for (TriangleObject n: curr.neighbors()) 28: if (notSeen(n)) frontier.add(n); 29: } catch(NonLocalException e) { // triangle not in nhood, add to merge list 30: outeredges.add(e.getObject()); } 31: } 32: if (outeredges.isEmpty()) { 33: retriangulate(); split(Triangle); 34: } 35: } 18: Cavity(Triangle t) { ... initialize data fields 19: frontier.enqueue(t); 20: build(); } ...
Boruvka’s algorithm for minimum spanning tree • Intuition: • A spanning tree is a neighborhood • Two trees can merge to form a bigger tree. • At the end, we have the full spanning tree. • Initially, • Each tree (neighborhood) has one node. • Private data = list of weights of outgoing edges. • As algorithm progresses, trees merge.
Minimum spanning tree // computes the new minimal edge of // the merged tree 13: void computeNewMinEdge() {...} 14: } 1: shared Node { 2: List edges; 3: List weights; 4: } 5: nhoodComputeSpanningTree { 6: Spanning tree; 7: Node root; 8: Edge minOutEdge; 9: action { 10: merge(minOutEdge) : 11: computeNewMinEdge(); 12: }
Barnes-Hut N-body simulation • Parallelization opportunities: At each step: • Summarizing each node in the octree (computing the centre of gravity for each rectangle). • Computing forces and advancing the bodies. • Simulation step computed sequentially.