July 16, 2007

Parallel Hierarchical 2D Unstructured Mesh Generation with General Cutting A thesis defense presented by Vincent C. Betro July 16, 2007

Outline • Introduction • Objective and Approach • Terminology and Data Structures • Cartesian Mesh Creation • General Cutting and Boundary Element Creation • Meshing Results • Parallelization Results • Conclusions • Future Work

Introduction Solving modern CFD problems necessitates a rapid, efficient, and high quality mesh generation process; P_HUGG2D is a step in that direction. *Courtesy of Dr. Robert Wilson

Objective and Approach • Develop an algorithm for generating a high-quality mesh • Create Hybrid and Delaunay meshes with body-conforming cut elements using closed loops • Allow user to define refinement spacing • Speed process by using MPI and grid partitioning • Implement various C++ class structures for compact communication during meshing • Validate mesh quality by testing on several geometries and optimizing the final mesh with optimization-based smoothing

Terminology and Data Structures Types of meshing • Overset (structured) • Extrusion (used to make 3-D slices or insert viscous layers) • Delaunay (fully triangular or tetrahedral) • Cartesian (basis for P_HUGG2D)

Terminology and Data Structures P_HUGG2D uses Isotropic refinement to build the Quadtree structure. This allows for uniformity which makes data structures more consistent and communication more efficient.

Terminology and Data Structures • The building block of a hierarchical Cartesian mesh is the voxel, which is short for “volumetric pixel”. • Voxels are indexed using a processor-index pair, to aid in parallel communication • Each voxel contains information pointing to its relational location in the mesh, but no physical coordinates • cell-to-node hash table • parent index • neighbor indices • child indices • boundary segment list/boundary element loop list

Terminology and Data Structures Use of consistent, counterclockwise, C++-adapted CGNS numbering conventions for voxel relative location

Terminology and Data Structures Physical nodes… • are also indexed using a processor-index pair • are assigned ownership by the lowest processor that owns a voxel which contains the node • contain the physical coordinates of each node created as part of refining a voxel • can be ignored until the general cutting process when tolerances dictate the snapping of nodes

Terminology and Data Structures General mesh file format • Necessary for implementation with flow solver • Easily converted into crunch file, CGNS file, etc. • Supports multiple block meshes • Gives physical location of nodes, the FORTRAN-indexed map of which is used to give the location of each triangle, quadrilateral, and boundary/geometry edge

Cartesian Mesh Creation Super Cell Creation In order to begin recursive refinement, a Cartesian super cell is created around the existing geometry, unless the outer boundary is initially square in which case the super cell and the outer boundary are coincident and there will be no “external” voxels to be turned off during cutting.

Cartesian Mesh Creation Spawning to Multiple Processors Once the super cell has been refined into as many (or more) voxels as there exist processors, each processor receives one (or more) voxel. Ownership of the voxel is reassigned to the processor to which it is spawned, and the nodes’ processor-index pairs are then updated.

Cartesian Mesh Creation Round-robin assignment of processors does not equate with perfect load-balancing conditions, but it does not affect the algorithm used to speed meshing in parallel. 4 procs 8 procs 16 procs The color coding corresponds to the domain owned by each of the processors. A disjoint domain is a distinct possibility, when the number of processors is not a power of four.

Cartesian Mesh Creation Refinement occurs on each processor simultaneously and… • new nodes are created at the mid-edges and centroids of existing voxels • neighbors are re-calculated based on the tree structure • lineage of voxels is passed along in the processor-index pair

Cartesian Mesh Creation The user may specify spacing based on the size of the largest and smallest geometry segments. Finer far-field Coarser far-field

Cartesian Mesh Creation Mesh quality is enforced by determining unacceptable voxel configurations Opposite neighbors both at a higher level of refinement than the current voxel One edge connecting more than three different levels of refinement

Cartesian Mesh Creation • The progression of refinement is a consequence of both the cell size gradation parameter and the user-defined spacing supplied for this mesh. • The spacing was one order of magnitude less than the smallest geometric spacing to assure full detail. • In P_HUGG2D, the cell size gradation parameter is 1.5, implying that voxels should be refined to the same size in groups of four or five buffer elements.

Cartesian Mesh Creation Ghost voxels… • are integral in assuring that refinement is consistent on borders between processors • are denoted by having a different processor set as owner in the processor-index pair than the processor on which they reside • allow new nodes created during refinement and cutting to be indexed correctly and not be duplicated • exist in the normal neighboring positions to a voxel as well as at the corners • contain no information about non-bordering children or the results of the cutting process

Cartesian Mesh Creation • The voxels shaded in orange are in the upper left corner of a given processor. • The voxels shaded in green are the finest level ghost voxels used in the neighbor tables on that processor. • The voxels shaded in blue are the ghost parents of ghost voxels, but only show the children directly bordering the processor in question.

General Cutting and Boundary Element Creation Once a mesh has been generated around a geometry, the elements (voxels and nodes) that are outside the computational domain must be “turned off”. Then, body conforming quadrilaterals and triangles are generated with the remainders of voxels that have been “cut” by the geometry.

General Cutting and Boundary Element Creation

General Cutting and Boundary Element Creation Parametric Coordinates If a collinear boundary edge is discovered, a check must occur to determine if the boundary edge covers the original voxel edge completely and must be clipped on both sides, covers the original voxel edge partially and must be clipped on one side, or is completely contained in the voxel edge. To accomplish this, parametric coordinates are used with the following formula:

General Cutting and Boundary Element Creation Possible scenarios for intersection of a collinear geometry edge with a boundary edge

General Cutting and Boundary Element Creation Tolerance The tolerance used in P_HUGG2D is 0.001 times the length of a side of a voxel on the finest level. While this can be an issue of contention, experiment has shown this tolerance to be most correct for snapping cutting intersections to already created points without significant loss of accuracy in reconstructing the original geometry.

General Cutting and Boundary Element Creation Possible scenarios for a geometry edge to “cut” a voxel (excluding the collinear case)

General Cutting and Boundary Element Creation Geometry segments that are captured during the cutting process before and after being agglomerated within each voxel. Post-agglomeration Pre-agglomeration

General Cutting and Boundary Element Creation Two lists of edges are created in each voxel: • a list of all the edges on the uncut version of the voxel (unless a collinear boundary edge was created over an existing edge) • a list of all the edges in the voxel’s boundary element list Closed loops are created using head to tail matching of edges where boundary edges are given first priority 11 edges to choose from 3 boundary edges Finished closed loop

General Cutting and Boundary Element Creation A loop which passes back through itself must be split A closed loop (begun at green node), which must be split since it passes back through itself The same closed loop, split into two quadrilateral boundary elements that can be triangulated

General Cutting and Boundary Element Creation • Boundary element creation allows all the nodes near the boundary to be tagged as in or out • Flood fill is completed to turn off all outside nodes • The situation can occur wherein a voxel is outside the computational domain but all its nodes are inside the computational domain. Thus, a check is completed such that if no other cell shares an edge with the cell in question, it is turned off since it must be outside the computational domain.

General Cutting and Boundary Element Creation P_HUGG2D has the capability of generating hybrid meshes (quadrilaterals and triangles) or fully triangular, Delaunay meshes. • For a hybrid mesh, boundary elements and voxels with hanging nodes are triangulated and the remaining voxels are saved as quadrilateral cells • For a Delaunay mesh, the boundary edges are agglomerated into an outer boundary along with the processor boundaries, and the Delaunay mesher is run to populate the closed area with triangular cells

Optimization-based Mesh Smoothing In order to remove high aspect ratio elements (sliver cells) and get improved results from the flow solver, optimization-based smoothing is performed on the mesh. • Each node is perturbed based on a cost function calculated using Jacobians and condition numbers of the surrounding elements. • If the perturbation improves the cost function for the node, the node is moved permanently to the new position • The mesh is moved until eventually all perturbations cannot improve the cost function

Optimization-based Mesh Smoothing Once this smoothing has been performed a prescribed number of times and the maximum cost function has reached a steady level (usually between ) the mesh is ready to be run through the flow solver or extruded to three dimensions and run through the flow solver. Smoothed Hybrid Mesh Unsmoothed Hybrid Mesh

Optimization-based Mesh Smoothing Once a Delaunay mesh has been run through the optimization-based smoother, it is no longer guaranteed to be a truly Delaunay mesh. However, this does not jeopardize the mesh quality; in fact, the important grid metrics (aspect ratio, condition number, etc.) are greatly improved. Smoothed Fully Triangular Mesh Unsmoothed Fully Triangular Mesh

Meshing Results The initial testing was done on two distinct NACA 0012 airfoils. The second airfoil was shifted such that the partitioning of the geometry would not be symmetric about processor borders and the robustness of the parallel communication could be better tested. Something to note about these meshes is that they do not change as the processor count changes; thus, the parallelization does not affect the quality of the mesh.

Meshing Results Symmetric NACA 0012 Airfoil: Delaunay Mesh (1, 2, 4, 8, 16 processor cases)

Meshing Results Symmetric NACA 0012 Airfoil: Hybrid Mesh (1, 2, 4, 8, 16 processor cases)

Meshing Results Shifted NACA 0012 Airfoil: Delaunay Mesh (1, 2, 4, 8, 16 processor cases)

Meshing Results Shifted NACA 0012 Airfoil: Hybrid Mesh (1, 2, 4, 8, 16 processor cases)

Meshing Results Gulf of Mexico Meshes The Gulf of Mexico cases were run in order to test the robustness of P_HUGG2D as well as to display a case that is sufficiently large enough to be usefully run in parallel and exhibit speed-up. Additionally, it should be noted that for this case, all the meshes came out to be identical except the 16 processor mesh, where in areas of low solution gradient the cell size gradation parameter was not observed across processor boundaries. Also, the eight processor case was too disjoint to run properly. These issues have been addressed in the 3-D code and are primarily the fault of the simplistic load-balancing scheme.

Meshing Results Gulf of Mexico: Delaunay Mesh (1, 2, 4, 16 processor cases)

Meshing Results Gulf of Mexico: Hybrid Mesh (1, 2, 4, 16 processor cases)

Meshing Results Extruded NACA 0012 airfoil (α = 9°) This optimized, extruded mesh of a slice of an airfoil was used by Dr. Robert Wilson to reproduce the desired flow solver results for the paper “Simulation of a Surface Combatant with Dynamic Ship Maneuvers”.

Meshing Results Extruded, Optimized Hybrid Mesh of the Side and Front Views of a Sea Fighter Surface Ship

Parallelization Results The goal of writing P_HUGG2D was not to implement an algorithm with perfect load-balancing and parallel speed-up but rather to provide a mechanism to do very large meshes that simply could not be quickly or efficiently created in serial. However, speed-up was exhibited on the large Gulf of Mexico case and to some extent even on the smaller NACA 0012 airfoil cases.

Parallelization Results While some speed- up was exhibited on the NACA 0012 airfoil, the sparse nature of the far-field and idle time spent when too many processors were used is apparent in the slow-down encountered when using 16 processors.

Parallelization Results The speed-up exhibited by the Gulf of Mexico cases was not linear and slackened as more processors were added due to load-balancing issues. However, unlike the airfoil case, some speed-up was exhibited each time more processors were added.

Conclusions • The algorithm now exists to generate large, high-quality meshes on complex two-dimensional geometries in parallel • The ability to extrude two-dimensional meshes into three dimensions allows immediate, relevant use of P_HUGG2D • The meshes generated can be either fully triangular or hybrid in nature • The use of general cutting allows for very precise, body-conforming meshes • The use of the Cartesian, hierarchical Quadtree structure allows for ease of initial mesh generation • The combination of Cartesian meshing and general cutting along with optimization-based smoothing makes generating the perfect mesh attainable

Future Work • By reversing the order of the geometry segments while processing, the goal of creating multiblock meshes is in sight. This coarse example of a mesh on a gas channel in a fuel cell is a perfect example of a situation where solving on one computational domain at a time is insufficient, due to interactions at the interfaces.

Future Work • Another goal is to correct some of the load-balancing issues and improve functionality by spawning areas of the mesh based on the amount of geometry they contain using a cost function based on a geometry intersection test • Implementing the code in 64-bit is trivial in that it requires no re-writing. However, once the bigfrog cluster is capable of doing this and using mpich2, the limits on the size of mesh that may be generated are completely obliterated. Since individual allocations (such as voxel objects) can go over 2 GB, the only limitation becomes the number of processors available. • Components of P_HUGG2D, as well as lessons learned from it, are being implemented in P_HUGG, the three-dimensional version being written by Dr. Karman. Additionally, the three-dimensional optimization-based smoothing will be parallelized by October.

July 16, 2007

July 16, 2007

Presentation Transcript

July 2007

July 2007

July 16, 2012

July 16, 2008

July 2007

July 2007

July 2007

July 16, 2008

July 2007

DIS Dept Mtg 16 July, 2007

July 2007

July 2007

July 16, 2007

July 2007

July 16, 2008

July 2007

July 2007

Andreas Barner Minneapolis, July 16, 2007

July 2007

July, 2007

SUMMIT PROGRAM July 16 - August 10, 2007`