570 likes | 687 Views
Scalable System for Large Unstructured Mesh Simulation. Miguel A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio Oñate. Overview. Introduction Preparation and Simulation More Efficient Partitioning Parallel Element Splitting Post Processing Results Cache
E N D
Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio Oñate
Overview • Introduction • Preparation and Simulation • More Efficient Partitioning • Parallel Element Splitting • Post Processing • Results Cache • Merging Many Partitions • Memory usage • Off-screen mode • Conclusions, Future lines Acknowledgements
Overview • Introduction • Preparation and Simulation • More Efficient Partitioning • Parallel Element Splitting • Post Processing • Results Cache • Merging Many Partitions • Memory usage • Off-screen mode • Conclusions, Future lines Acknowledgements
Introduction • Education: Masters in Numerical Methods, trainings, seminars, etc. • Publishers: magazines, books, etc. • Research: PhD’s, congresses, projects, etc. • One of the International Centers of Excellence on Simulation-Based Engineering and Sciences [Glotzer et al., WTEC Panel Report on International Assessment of Research and Development in Simulation Based Engineering and Science. World Technology Evaluation Center (wtec.org), 2009].
Introduction • Simulation: structures
Introduction • CFD: Computer Fluid Dynamics
Introduction • Geomechanics • Industrial forming processes • Electromagnetism • Acoustics • Bio-medical engineering • Coupled problems • Earth sciences
Visualization of results Geometrydescription Preparation of analysis data Computer Analysis Provided by CAD or using GiD Introduction • Simulation GiD
Introduction • Analysis Data generation Read in and correct CAD data Assignment of boundary conditions Definitions of analysis parameters Generation of analysis data Assignment of material properties, etc.
Introduction • Visualization of Numerical Results • Deformed shapes, temperature distributions, pressures, etc. • Vector, contour plots, graphs, • Line diagrams, results surfaces • Animated sequences • Particle line flow diagrams
Introduction • Goal: do a CFD simulation with 100 Million elements using in-house tools • Hardware: cluster with • Master node: 2 x Intel Quad Core E5410, 32 GB RAM • 3 TB disc with dedicated Gigabit to Master node • 10 nodes: 2 x Intel Quad Core E5410 and 16 GB RAM • 2 nodes: 2 x AMD Opteron Quad Core 2356 and 32 GB • Total of 96 cores, 224 GB RAM available • Infiniband 4x DDR, 20 Gbps
Introduction • Airflow around a F1 car model
Introduction • Kratos: • Multi-physics, open source framework • Parallelized for shared and distributed memory machines • GiD: • Geometry handling and data management • First coarse mesh • Merging and post-processing results
Introduction res. 1 part 1 res. 2 part 2 · · · · · · Geometry Partition Merge Conditions Distribution Visualize Materials Communication plan part n res. n Coarse mesh generation Refinement Calculation
Overview • Introduction • Preparation and Simulation • More Efficient Partitioning • Parallel Element Splitting • Post Processing • Results Cache • Merging Many Partitions • Memory usage • Off-screen mode • Conclusions, Future lines and Acknowledgements
Meshing • Single workstation: limited memory and time • Three steps: • Single node: GiD generates a coarse mesh with 13 Million tetrahedrons • Single node: Kratos+ Metis divide and distribute • In parallel: Kratos refines the mesh locally
Preparation and simulation res. 1 part 1 res. 2 part 2 · · · · · · Geometry Partition Merge Conditions Distribution Visualize Materials Communication plan part n res. n Coarse mesh generation Refinement Calculation
Efficient partitioning: before • Rank0 read the model, partitions it and send the partitions to the other ranks Rank 0 Rank 1 Rank 2 Rank 3
Efficient partitioning: before • Rank0 read the model, partitions it and send the partitions to the other ranks Rank 0 Rank 1 Rank 2 Rank 3
Efficient partitioning: before • Requires large memory in node 0 • Using the cluster time for partitioning which can be done outside • Each rerun need repartitioning • Same working procedure for OpenMP and MPI run
Efficient partitioning: now • Dividing and writing the partitions in another machine • Reading data of each rank separately
Preparation and simulation res. 1 part 1 res. 2 part 2 · · · · · · Geometry Partition Merge Conditions Distribution Visualize Materials Communication plan part n res. n Coarse mesh generation Refinement Calculation
Local refinement: triangle k k n m 3 m n 4 1 2 k k j i i j l l k m 2 2 1 1 j i i i j j l l k k k m 3 m m 3 1 2 2 1 j i i i j j l l l
Local refinement: triangle • Selecting the case respecting nodes Id • The decision is not for best quality! • It is very good for parallelization • OpenMP • MPI k k k m 3 m m 3 1 2 2 1 j i i i j j l l l
Local refinement: tetrahedron Father Element Child Elements
Local refinement: uniform • A Uniform refinement can be used to obtain a mesh with 8 times more elements • Does not improve the geometry representation
Introduction res. 1 part 1 res. 2 part 2 · · · · · · Geometry Partition Merge Conditions Distribution Visualize Materials Communication plan part n res. n Coarse mesh generation Refinement Calculation
Parallel calculation • Calculated using 12 x 8 MPI processes • Less than 1 day for 400 time steps • About 180 GB memory usage • Single volume mesh of 103 Million tetrahedrons split into 96 files ( mesh portion and its results)
Overview • Introduction • Preparation and Simulation • More Efficient Partitioning • Parallel Element Splitting • Post Processing • Results Cache • Merging Many Partitions • Memory usage • Off-screen mode • Conclusions, Future lines and Acknowledgements
Post processing res. 1 part 1 res. 2 part 2 · · · · · · Geometry Partition Merge Conditions Distribution Visualize Materials Communication plan part n res. n Coarse mesh generation Refinement Calculation
Post-process • Challenges to face: • Single node • Big files: tens or hundreds of GB • Merging: Lots of files • Batch post-processing • Maintain generality
Big Files: results cache • Uses a defined memory pool to store results. • Used to cache results stored in files. User definable Memory pool Results from files: single, multiple, merge Temporal results Mesh information Created Results: cuts, extrusions, tcl
Big Files: results cache RC Info Result memory footprint file 1 offset type Results cache table RC info file 2 offset type RC entry timestamp · · · · · · · · · RC entry file n offset type timestamp Result · · · · · · · RC info RC entry timestamp Result Open files table RC info file handle type file handle type · · · · · · · · · Granularity of result file handle type
Big Files: results cache • Verifies result’s file(s) and gets result’s position in file and memory footprint. • Results of latest analysis step in memory. • Loaded on demand. • Oldest results unloaded if needed. • Touch on use.
Big Files: results cache • Chinese harbour: 104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16 GB memory usage ( 2 GB results’ cache)
Big Files: results cache • Chinese harbour: 104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16 GB memory usage ( 2 GB results’ cache)
Merging many partitions • Before: 2, 4, ... 10 partitions • Now: 32, 64, 128, ... of a single volume mesh • Postpone any calculation: • Skin extraction • Finding boundary edges • Smoothed normals • Neighbour information • Graphical objects creation
Merging many partitions Telescope example 23,870,544 tetrahedrons Before 32 partitions 24’ 10” After 32 partitions 4’ 34” 128 partitions 10’ 43” Single file 2’ 16”
Merging many partitions Racing car example 103,671,344 tetrahedrons Before 96 partitions > 5 hours After 96 partitions 51’ 21” Single file 13’ 25”
Memory usage • Around 12 GB of memory used with a spike of 15 GB ( MS Windows) 17,5 GB ( Linux), including: • Volume mesh ( 103 Mtetras) • Skin mesh ( 6 Mtriangs) • Several surface and cut meshes • Stream line search tree • 2 GB of results cache • Animations
Batch post-processing: off-screen • GiD with no interaction and no window • Command line: gid -offscreen [ WxH] -b+gbatch_file_to_run • Useful to: • launch costly animations in bg or in queue • use gid as template generator • use gid behind a web server: Flash Video animation • Animation window: added button to generate batch file for offscreen-gid to be sent to a batch queue.