300 likes | 336 Views
Building a parallel implementation of Delaunay Triangulation in R3 with CGAL for shared memory parallel machines using OpenMP. The goal is to mesh billions of points efficiently while handling big data sets. Enhancements include compact container structures and efficient locking mechanisms.
E N D
A Parallel Delaunay algorithm for CGAL David Millman Advisor: Sylvain Pion July 26th 2007
Goal To create a parallel implementation of Delaunay Triangulation in R3 with CGAL for shared memory parallel machines using OpenMP.
Motivation • Delaunay’s many uses • Meshing in finite element theory • computational biology • geometric modeling • anything that can be done with a Voronoi diagram • Multi-Processor systems • more common • multi core systems
Motivation (cont.) • Big data sets • Robust algorithms to mesh billions of points • Sequentially CGAL • 1 processor and 16GB ram, 10 million points ~120 seconds and uses 5.5GB ram • Blandford, Belloch, Kadow ‘06 • 64 processors and 200GB ram 1 billion points 5512 seconds and used 197GB
Tools • CGAL - Computational Geometry Algorithms Library www.cgal.org • OpenMP - API for shared memory parallel programming www.openmp.org • Capricorne 2 quad core processors (8 cores) 16GB ram
CGAL Delaunay Algorithm • Locate • Find Conflict Region • Remove invalid cells • Create New Cells
Steps to Parallelization • Compact Container • Locate • Find Conflict Region • Create New Cells
Locks • OpenMP provides • Test lock • Wait lock • Priority lock • Lock and priority pair • Test lock • Priority lock
CGAL Locks • Omp_lock_traits • Export types • Lock_type • Priority_type • Constants • max_num_threads • is_parallel • Static function to handle omp functions • static void set_num_threads(int i) • static size_t get_num_threads() • static void wait_lock(Lock_type* lock) • Priority lock • bool priority_lock(Priority_type p) • bool test_lock(Priority_type p) • void unset_lock() • bool is_priority(Priority_type p) const • Omp_empty_lock_traits • Same interface
Compact Container Free List • STL like container • Pointers to 4 byte aligned objects • Iterators are not invalidated during insert and delete • Memory
MT-Compact Container • Each thread maintains its own free list • Insert • Delete • Allocate • Only lock for allocation • Size Formula • Memory Free List Where NT = number of threads
MT-Compact Container (cont.) • Old: • Compact_container<T, Allocator = Default_allocator> • New: • Compact_container<T, Allocatror = Default_allocator, Lock_type = Omp_empty_lock_traits> • No new functions • Free list array is a boost array parameterized on lock_traits::max_num_threads
Locate point p • Start at some cell, c x • Determine which face, f, of c, p is outside of z c y • Repeat with the adjacent cell that shares f with c • Continue until p is contained in the current cell
MT-Locate • Same steps as Locate, but we must lock and unlock the vertices of the cells, to avoid the cell being destroyed. x z y
Find Conflict Region • Initialize c,be the cell containing p • If p is in the circumcircle of the vertices of the c mark it as conflict • Expand until conflict region is found
MT-Find Conflict Region • Once again, same steps, but we must lock and unlock vertices to avoid deadlocks
Create New Cell • Remove cells which are in conflict creating a hole • Triangulate the hole with a star
MT-Create New Cell The same as Create New Cell • Remove cells which are in conflict creating a hole • Triangulate the hole with a star …Just release the locks at the end.
TDS • Vertex base • Old: TDS_vertex_base<TDS> • New: TDS_vertex_base<TDS, LT=Omp_empty_lock_traits> • Private derivation of Priority_lock • Functions for locking, unlocking, etc. • Cell base – no changes • TDS • Added functions to help with locking and unlocking • priority_lock_cell, priority_lock_mirror_vertex, • is_locked (vertex and cell) • lock (vertex and cell)
Triangulation_3 and Delaunay_3 • Triangulation_3 • parallel_locate(Point p, Vertex start) • vertex as hint • cell returned is locked • error_vertex • query and access functions (similar to infinite vertex) • Delaunay_3 • parallel_insert(Iterator begin, Iterator end, int num_threads)
Results Summary • Compact Container • Locate • Delaunay
Future work • Optimize • Optimize • Optimize • Optimize • Parallel mesh refinement • Mesh compression
Thank you • INRIA, NSF, REUSSI, Sylvain Pion and Chee Yap and Everyone responsible for putting this program together.