150 likes | 284 Views
Application Paradigms: Unstructured Grids CS433 Spring 2001. Laxmikant Kale. Unstructured Grids. Typically arise in finite element method: E.g. Space is tiled with variable-size-and-shape triangles in 3D: may be tetrahedra, or hexahedra
E N D
Application Paradigms:Unstructured GridsCS433Spring 2001 Laxmikant Kale
Unstructured Grids • Typically arise in finite element method: • E.g. Space is tiled with variable-size-and-shape triangles • in 3D: may be tetrahedra, or hexahedra • Allows one to adjust the resolution in different regions • The base data structure is a graph • Often, represented as bipartite graph: • E.g. Triangles (Elements) and Nodes
Unstructured grid computations • Typically • Attributes (stresses, strains, pressure, temperature, velocities) are attached to nodes and elements • Programs loop over elements and loop over nodes, separately • Each time you “visit” an element: • Need to access, and possibly modify, all nodes connected to it. • Each time you visit a node: • Typically, access and modify only node attributes • Rarely: access/modify attributes of elements connected to it
Unstructured grids: parallelization issues • Two concerns: • The unstructured grid graph must be partitioned across processors • vproc (virtual processor, in general) • Boundary values must be shared • What to partition and what to duplicate (at the boundaries) • Partition elements (so each element belongs to exactly one vproc) • Share nodes at the boundary • Each node potentially has several ghost copies • Why is this better than partitioning nodes, and sharing elements?
Partitioning unstructured grids • Not so simple as structured grids • “by rows”, “by columns”, “rectangular”, .. Don’t work • Geometric? • Applicable only if each node has coordinates • Even when applicable, may not lead to good performance • What performance metrics to use? • Load balance: the number of elements in each partition • Communication • Number of shared nodes (Total) • Maximum number of shared nodes for any one partition • Maximum number of “neighbor partitions” for any partition • Why? per message cost • Geometric: difficult to optimize both
MP issues: • Charm++ help: • Today (Wed, 2/21) 2pm to 5:30 pm, • 2504, 2506, 2508 DCL (Parallel Programming Laboratory) • My office hours for this week: • Thursday 10:00 A.M. to 12:00 noon on Thursday
Grid partitioning • When communication costs are relatively low • Either because the data-set is large or the computation per element is large • Geometric methods can be used: • Orthogonal Recursive Bisection (ORB) • Basic idea: Recursively divide sets into two • Keep shapes squarish as long as possible • For each set: • Find bounding box (Xmax, Xmin, Ymax, Ymin, ..) • Find the longer dimension (X or Y or ..) • Find a cut along the longer dimension that will divide the set equally • Doesn’t have to be at the midpoint of the section • Partition the element in the two sets based on the cut • Repeat for each set • Variation: non-power-of-two processors
Grid partitioning: quad/oct trees • Another Geometric technique: • At each step, divide the set into 2xD subsets, where D is the number of physical dimensions\ • In 2-D: 4 quadrants • Dividing line goes thru geometric midpoint of the box. • Bounding box is NOT recalculated each time in the recursion • Comparison with ORB
Grid partitioning: Graph partitioners • CHACO and METIS are well-known programs • Optimize both load imbalance and communication overhead • But often ignore per-message cost, or the maximum-per-partition costs • Earlier algorithm: KR (Kernigham-Ritchie) • METIS first coarsens the graph, applies KR to it, and then refines the graph • Doing this not just once, but a k-level coarsening-refining
Crack Propagation • Explicit FEM code • Zero-volume Cohesive Elements inserted near the crack • As the crack propagates, more cohesive elements added near the crack, which leads to severe load imbalance • Framework handles • Partitioning elements into chunks • Communication between chunks • Load Balancing Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Pictures: S. Breitenfeld, and P. Geubelle
Crack Propagation Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Both decompositions obtained using Metis. Pictures: S. Breitenfeld, and P. Geubelle
Unstructured grid: managing communication • Suppose triangles A B and C are on different processors • Node 1 is shared between all 3 processors • Must have a copy on all 3 processors • When values need to be added up: • Option 1 (star): let A (say) be the “owner” of 1, • B and C send their copy of “1” to A, • A combines them (usually, just adding them up) • A sends updated values to B and C • Option 2: (symmetric): each sends its copy of 1 to both the others • Which one is better? 1 C B A
Unstructured grid: managing communication • In either scheme: • Each vproc maintains a list of neighboring vprocs • For each neighbor: • maintains a list of shared nodes • Each node has a local index (my 5th node). • The same list works in both directions • Send • Receive
Adaptive variations: Structured grids: • Suppose you need a different level of refinement at different places in the grid: • Adaptive Mesh Refinement • Quad and Oct trees can be used • Neighboring regions may have resolutions that differ by 1 level • Requiring (possibly complex) interpolation algorithms • The fact that you have to do the refinement in the middle of a parallel computation makes a difference • Again and again, but often not every step • Adjust your communication list • Alternatively, put a layer of software in the middle to do the interpolations • so each square chunk thinks it has exactly one nbr on each side
Adaptive variations: unstructured grids • Mesh may be refined in places, dynamically: • This is much harder to do (even sequentially) than for structured grids • Think about triangles: • Quality restriction: avoid skinny long triangles • From parallel computing point of view: • Need to change the list of shared nodes • Load balance may shift • Load balancing: • Abandon partitioning and repartition • Incrementally adjust • (typically with virtualization)