380 likes | 393 Views
This article discusses the efficient implementation of the Finite Element Method (FEM) on adaptive Cartesian grids. It covers topics such as element-wise traversal, cache efficiency, data structures, and minimal memory requirement. A case study on the Jacobi solver is presented. The use of elements instead of nodes and the traversal of adaptive grids are also explored.
E N D
Efficient Finite Element Method Implementation Realization on Adaptive Cartesian Grids Jointed Advanced Student School (JASS2006) Numerical Simulation TU München Carla Guillen
Agenda • FEM: Quick Review • Adaptive Cartesian Grids and FEM • Element-wise Traversal • Case Study: Jacobi Solver • Data Structures • Cache Efficiency • Traversal of Adaptive Grid • Peano Tree and Quadtree • Minimal Memory Requirement • Parallelization
Finite Element Method Review …we recall the treatment of a PDE with the FEM: • Use the weak form of the PDE • Choose Test and Shape functions • Obtain linear system of equations and stencil
System of Equations System of equations obtained from FEM has the form Au=b A: System matrix u: Vector containing unknowns assigned to vertices on a cartesian grid b: Vector containing right hand side values
Non-zero coefficients of u Zero coefficients of u We can obtain the stencil from the structure of the rows. System Matrix System matrix is typically sparse.
General Stencil The structure of the row (an example): [ 0 0-1 -1 -10 0 0-1 8 -10 0 0-1 -1 -10 0 ] All nodes depend on their neighbouring nodes:
Agenda • FEM: Quick Review • Adaptive Cartesian Grids and FEM • Element-wise Traversal • Case Study: Jacobi Solver • Data Structures • Cache Efficiency • Traversal of Adaptive Grid • Peano Tree and Quadtree • Minimal Memory Requirement • Parallelization
Adaptive grids • The need of fine grids arises when: • Complex geometry boundaries • Singularities (related to discretization error) • Multi-scale phenomena • High resolution is sometimes required but not affordable: • High resolution leads to the need of more memory space • Time complexity of applied algorithms increases • Why not combine high and low resolution where needed?
Adaptive grids • Refine only where necessary combining high and low resolution: • Done in a recursive way. • Splitting of a cell into subgrids. • Done only where needed.
Agenda • FEM: Quick Review • Adaptive Cartesian Grids and FEM • Element-wise Traversal • Case Study: Jacobi Solver • Data Structures • Cache Efficiency • Traversal of Adaptive Grid • Peano Tree and Quadtree • Minimal Memory Requirement • Parallelization
Use of Elements • Advantages of using elements instead of nodes: • The calculation of the stencil for the adaptive case simplyfies • Space filling curves are easily applied. • Stack data structures are applicable.
Agenda • FEM: Quick Review • Adaptive Cartesian Grids and FEM • Element-wise Traversal • Case Study: Jacobi Solver • Data Structures • Cache Efficiency • Traversal of Adaptive Grid • Peano Tree and Quadtree • Minimal Memory Requirement • Parallelization
Implicit solver: Jacobi • Jacobi General Formulation: • Jacobi Iterations: While residual is not sufficiently small: End while.
Residual in element-wise view Stencil is splitted into elements. How do we treat the residual per element? • The residual of one node is equal to the right hand side minus the stencil times u. • The residual of the node is the sumation of the residual of all elements
Additional solvers to be used Not only Jacobi solver, but: • Gauss-Seidel • Red-black Gauss-Seidel • Conjugate Gradient • Multigrid with Jacobi smoother
Agenda • FEM: Quick Review • Adaptive Cartesian Grids and FEM • Element-wise Traversal • Case Study: Jacobi Solver • Data Structures • Cache Efficiency • Traversal of Adaptive Grid • Peano Tree and Quadtree • Minimal Memory Requirement • Parallelization
Considerations of the Traversal of the Grid • We need to traverse the grid in an efficient way: • Saving processing time. Cache efficient. • Saving memory space. • Use of parallel processing where available ?
CPU CPU CPU CPU Cache Cache Cache Cache RAM RAM RAM RAM copy copy copy copy Cache: Temporal Locality • The goal is to ensure that the information referenced now will be referenced in the near future.
Visiting all the elements: data is required more than once • Visiting element U6 will require to load vertex data that was already loaded • Element U1, U2 and U3 were already referenced previously and their vertices may be still on the cache memory U3 U7 U11 U2 U6 U10 U1 U5 U9
Agenda • FEM: Quick Review • Adaptive Cartesian Grids and FEM • Element-wise Traversal • Case Study: Jacobi Solver • Data Structures • Cache Efficiency • Traversal of Adaptive Grid • Peano Tree and Quadtree • Minimal Memory Requirement • Parallelization
Space filling curves: Peano curve • Cartesian grid is divided into nine elements. Elements with need of higher resolution are divided again into nine. • The traversing of the cells is done in a characteristic order: • Traversed only once • Cells visited in a succession must be neighbouring cells.
Peano Curve Level 2 Level 3 Level 4 Level 2 in 3D
Other space filling curves • Hilbert curve • Sierpinsky (uses triangular grid)
Agenda • FEM: Quick Review • Adaptive Cartesian Grids and FEM • Element-wise Traversal • Case Study: Jacobi Solver • Data Structures • Cache Efficiency • Traversal of Adaptive Grid • Peano Tree and Quadtree • Minimal Memory Requirement • Parallelization
Tree Representation of the Adaptive Grid • The representation of the adaptive grid can be done with a tree. • Data of a cell is stored at the corresponding node. • The root represents the lowest resolution level. • The leaves are the high resolution levels.
Levels of the Grid • Let‘s consider the following adaptive grid. • It contains 3 levels. • Highest resolution level occurs in two cells only.
Traversal of Each Level of the Adaptive Grid 0 First level Second level Third level
0 1 2 3 4 14 15 16 17 27 5 . . . . . . . 13 18 . . . . . . . 26 Peano-tree: Numbering of Elements and Data Structure
0 1 2 7 8 3 4 5 6 Quadtree • Same principle as the Peano tree. • Splitting occurs in one element into four subcells. • Node has either 0 or 4 children.
Agenda • FEM: Quick Review • Adaptive Cartesian Grids and FEM • Element-wise Traversal • Case Study: Jacobi Solver • Data Structures • Cache Efficiency • Traversal of Adaptive Grid • Peano Tree and Quadtree • Minimal Memory Requirement • Parallelization
0 1 2 7 8 3 4 5 6 The Refinement Extra Bit • Every element will contain one refinement bit. • If element‘s refinement bit is set, the node has children and these are visited.
Agenda • FEM: Quick Review • Adaptive Cartesian Grids and FEM • Element-wise Traversal • Case Study: Jacobi Solver • Data Structures • Cache Efficiency • Traversal of Adaptive Grid • Peano Tree and Quadtree • Minimal Memory Requirement • Parallelization
Load per Processor • The load per processor should be balanced although in adaptive grids this is not obvious. • Amount of workload per processor: elements/number of processors. • Dynamic load balancing even more complicated.
Challenges for Domains • A balanced load for each processor • Domain decomposition • A small ratio between comunication surface and volume • Compact domains • Not straight forward on adaptive grids
Summary • FEM: Main ingredients. • Adaptive cartesian grid: low and high resolution. • Element-wise view of grid. • Iterative methods to solve system of equations. • Data structures: • Cache efficiency • Traversal of adaptive grid • Peano tree and quadtree • Minimal memory requirement • Parallelization