650 likes | 949 Views
Domain Decomposition and Parallel Finite Element Methods. Abhijit Bose Center for Advanced Computing, and Dept. of Electrical Eng.. and Computer Science University of Michigan Ann Arbor, MI 48109 (734) 615 – 1490 Email: abose@umich.edu. What is Domain Decomposition ?
E N D
Domain DecompositionandParallel Finite Element Methods Abhijit Bose Center for Advanced Computing, and Dept. of Electrical Eng.. and Computer Science University of Michigan Ann Arbor, MI 48109 (734) 615 – 1490 Email: abose@umich.edu
What is Domain Decomposition ? - divide a large problem into smaller problems so that each small problem can be solved on a single node of a multi-processor system - most commonly used in solving partial and ordinary differential equations in computational fluid dynamics, structural mechanics, electrical circuits, automotive engineering - often used for solving large systems of finite element and finite volume meshes on distributed memory supercomputers and clusters
Outline Part I: An introduction to Domain Decomposition via the Substructuring Method Part II: A practical implementation of some of the concepts in C++
Processor zero effectively serialized => Lack of scalability => Poor Performance!!
But, Can we solve the Schur complement problem in parallel ?
Part II : Implementation in C++ (a) Mesh Partitioning - data structure often requires in-house development (b) Parallel Linear Solvers - many publicly available packages available nowadays (c) Recursive Interface Partitioning - data structure often requires in-house development
•Examples are taken from the Pfinics Project: Scalable Parallel h-p Finite Element Methods •Code Developed by: Abhijit Bose (Univ. of Michigan) Valmor F. de Almeida (ORNL) •Some of the software are in public domain. Please send an email to: abose@umich.edu with requests for software. • A Globus-enabled version of Pfinics that can be installed in wide-area distributed Grid Computing environments will be available in 1Q/2003. The next few slides will introduce some basic C++ techniques before we move on to the Parallel data structures part.
Why Use Object-Oriented Programming for Developing Parallel Adaptive Finite Element/Volume Codes ? • Growing number of libraries for writing object-oriented scientific computing codes • OOP offers several advantages: • abstract data types • class hierarchies • inheritance • polymorphism • virtual functions • Optimizing compilers are increasingly available
Grid based methods (finite elements, finite volumes) offer a natural object-oriented programming model • Specific examples from Pfinics project • Support for hierarchical classes: Global Computational Domain Subdomains Surfaces Elements (h, p, h-p adaptivity)Nodes
Classes for Communication PEList NeighPE NeighPE NeighPE PE 0 NeighNode NeighNode NeighNode NeighNode PE 1 NeighNode NeighNode PE 2 NeighNode can be list of nodes or list of nodal objects to be communicated with neighbors
Processor-level Class : PEList class PEList { private: class NeighNodeList { Public: NeighNodeList(int NeighNodeIn):link(0), NeighNode(NeighNodeIn){} NeighNodeList *link; int NeighNode; }; public: NeighPEList(int NeighPEIn):link(0),NeighPE(NeighPEIn), first(0), last(0){} NeighPEList *link; NeighNodeList *first, *last; NeighNodeList *ptemp; int NeightPE; // functions to manipulate the linked-lists as follows
Processor-level Class : PEList (continued) void add (int NeighNodeIn); int CountNodes( ); void getNodeList(PfVecI &NodeList); }; public: NeighPEList *first, *last; NeighPEList *ptemp; PEList ( ); void add(int NeighPEIn, int NeighNodeIn); int GetprocIntfcList(PfVecI &ProcIntfcList); int MaxmIntfcNodes( ); int SendNodalObject(int PEIndx); };
Constructing Derived Classes - Very Useful for Grid-based Methods Base Base dc1 dcA1 dcB1 dc2 dcA2 dcB2 dc = derived class dcA2_1 dcA2_2 Virtual functions do NOT span multiple inheritance hierarchies Examples of derived classes: nodes, elements, patches, subdomains Advice : keep it simple - use only 2 levels whenever possible
Early versus Late Binding Early binding: same as Fortran subroutine and C function calls Relative function addresses are determined at compilation time Late binding: Relative addresses can only be determined at run-time e.g. function pointers. In many cases, function pointers can be replaced by virtual functions. Example (late binding): #include <iostream.h> #include <math.h> // function prototyping void CalcSin(double &a); void CalcCosin(double &a); void CalcTangent(double &a); // define a function pointer (with input type as double) void (*fun[ ]) (double &) = {CalcSin, CalcCosin, CalcTangent}; main( ) { double a; int ia; cin >> a; cin >> ia; fun[ia](a);} // write code for CalcSin, CalcCosin and CalcTangent below...
Virtual Functions Provides a late binding mechanism for resolving common namespaces Consider the following problem: (without virtual functions) Base class : Element nnodes gnodes printNode( ) inh. = inherited nnodes(inh.) gnodes(inh.) printNode()(inh.) n3Dfaces nnodes(inh.) gnodes(inh.) printNode()(inh.) nrefLevels nnodes(inh.) gnodes(inh.) printNode()(inh.) nfaces Elem3Dcube Elem2Dtriangle Elem2Dquad
Virtual Functions (continued) class Element{ protected : int nnodes, gnodes; public: void printNode( ); }; // define Element::printNode( ) also... Base Class Definition class Elem2Dquad : public Element { int nfaces; public: Elem2Dquad(int, int, int); void printNode( ); }; Derived Class Definition
Virtual Functions (continued) Next, if you define a main program as follows : main ( ){ Element *elements[ 3 ]; // define pointers to class objects elements[0] = &Elem2Dquad(2,3,4); // initialize elements elements[1] = &Elem3Dcube(3,4,5); // with constructors elements[0]->printNode( ); } Question: printNode( ) is defined in both the base class Element and the derived class Elem2Dquad…which printNode() gets printed in the above line ? Answer: printNode( ) from the base class ! Reason: Elem2Dquad made no attempt to override printNode( ) function of its parent and uses the one it inherits. Fix: Virtual Function fixes this problem !…….see next slide
Virtual Functions (continued) change the based class Element as follows: class Element{ protected : int nnodes, gnodes; public: virtual void printNode( ); }; // define Element::printNode( ) also... Base Class Definition Now, Elem2Dquad’s function printNode( ) will be used ! If an element type is changed => NO need to change code Possible cases for using virtual functions: quadrature rules (different for different element types) adaptivity rules (quads get divided into 4 quads, triangles into 3 triangles,….etc…)
Virtual Functions (continued) You can still use the Base class’s printNode( ) if you wish : elements[0]->Element::printNode( ); This is useful if you want to hide certain data defined for one of the derived classes. Possible scenario (credit reporting agency) Base Class : customer (data: credit rating, amount of loan) Derived Class: customerJoe( additional data: bad past credit history) You may want to use base class customer’s print function that only prints customerJoe’s credit rating and amount of loan while responding to a bank’s request but does not give away customerJoe’s past credit history.
Derived classes for various element types Define a basic element class called Element with the attributes: class Element { protected: int eltype; // element type ID int nnodes; // number of nodes for field variables int gnodes; // number of nodes for geometry variables int kbasis; // geometry basis function order int nedges; // number of edges (2D) or surfaces (3D) int nqpx,nqpy,nqpz; // quadrature points in x, y, z direction PfVecI plevels; // p-level descriptor for elements int refnlevel; // h-refinement level for an element real *xc, *yc, *zc; //coordinates node *nodes; // nodes for this element public: Element(int, int, int, int, int); ~Element( );};
Derived classes for various element types (continued) (file: Element.C) #include “Element.h” Element::Element(int A, int B, int C, int D, int E,): eltype(A), nnode(B), gnode(C), kbasis(D), nedges(E) { // …define common properties here via virtual functions public: void GaussTable( ); void DivideElement( ); }; An example program for mixing element types (increasingly popular) #include “ElementLibrary.h” //contains header files for element types main( ) { Elem2Dquad quad2d; Elem2Dtriangle triang2d; Element *elem = new Element [2]; // allocates memory for 2 elements elem[0] = quad2d; elem[1] = triang2d; cout << elem[0] << elem[1];}
Derived classes for various element types (continued) • protected access provides both data protection and inheritance !! Variables defined with protected specifier are only available to Element class AND any other class derived from it. • PfVecI is a class that dynamically constructs p-degree polynomials for higher order finite elements • 2D rectangular arbitrary p-degree finite element : (file: Element2Dquad.h) #ifndef _Element2Dquad_H #define _Element2Dquad_H #include <iostream.h> // C++ I/O class library #include “Element.h” // basic finite element class #include “../PFMemory/PfAlloc.h” // dynamic memory allocation #include “../PFSystem/PfMathF.h” // Math and Fortran interface .. (next page…)
Derived classes for various element types (continued) class Element2Dquad : public Element { protected : char myname[25]; // ascii name of an element public: Element2Dquad(int=21, int=9, int=8, int=2, int=4, char[ ] = “2D p-Quad”); friend ostream & operator << (ostream & out, Element2Dquad &em); void DerivePLevels(PfAlloc &P); …..other functions ….. ; int GenDofs(PfAlloc &P, PfVecD &pf, int nelem, int ktype, int npoin, int iprvr, int idsub, int numConstrt); friend void FortranShared(SubDomainPE &PE, Solver &solv, Element2Dquad &em); ~Element2Dquad( ); }; #endif
How about sending groups of data across processors ? An example for sending mesh-related information to other processors given in the following slides. (will go over the actual code) domain : a prototype of a class domain containing the entire mesh elregion, tempfield : temporary data structure to pack the data before message-passing Basic steps: find out type of data (can do hardware-probing) define a new datatype for holding different types of data pack data send data via the new datatype MPI 1.1 standard does not provide C++ bindings for message-passing MPI 2.0 does have bindings for C++ (All non-static member functions in class MPI will be virtual functions)
#include “mpicomm.h” void sendsubdom( domain &dom, PfVecI &ProcElemList, PfVecI &LocalNodes, PfVecI &Llnods, PfVecI &XLlnods,int nelem, int TotalProcElem, int ndimLlnods, int npoinLocal, int iproc) { elregion elt[TotalProcElem]; tempfield fields[nf]; int ie, blen1[ ] = { 1, 1}, blen2[ ] = { 1, 1, 1}; int nsize,je,lnode,ke,header[nheader], nN,ndata = 0; double val0; PfVecD flddata; MPI_Comm comm;
MPI_Datatype elregion_type, tempfield_type; MPI_Datatype type1[2] = {MPI_INT, MPI_INT}; MPI_Datatype type2[3] = {MPI_INT, MPI_INT, MPI_INT}; MPI_Aint disp1[2], disp2[3]; comm = MPI_COMM_WORLD; // gather element type,region index information, // tags = 4. Use hardware probing of lengths // - safe implementation. MPI_Address(&elt[0].reg, &disp1[0]); MPI_Address(&elt[0].type, &disp[1]); for (i=1; i >= 0; i--) disp1[i] -= disp1[0]; MPI_Type_struct(2, blen1, disp1, type1, &elregion_type); MPI_Type_commit(&elregion_type);
for ( ie = 0; ie < TotalProcElem; ie++ ) { elt[ie].type = dom->elt[ ProcElemList(ie) ].type; elt[ie].reg = dom->elt[ ProcElemList(ie) ].reg; } // gather field information - integer data MPI_Address(&fields[0].type, &disp2[0]); MPI_Address(&fields[0].dim, &disp2[1]); MPI_Address(&fields[0].size, &disp2[2]); for (i=2; i >= 0; i--) disp2[i] -= disp2[0]; MPI_Type_struct(3, blen2, disp2, type2, &tempfield_type); MPI_Type_commit(&tempfield_type);
for ( ie = 0; ie < dom->nf; ie++ ) { fields[ie].type = dom->fld[ ie].type; fields[ie].dim = dom->fld[ ie].dim; fields[ie].size = dom->fld[ ie].dim * npoinLocal; ndata += dom->fld[ ie].dim * npoinLocal; } // calculate headers as data packet 1 , tag = 1 ndimLnods =XLlnods(TotalProcElem)-XLlnods(0); header(0) =dom->dim; header(1) =dom->nf; header(2) =npoinLocal; header(3) =TotalProcElem; header(4) =ndimLnods; header(5) =ndata;
// gather the data associated with the fields… // align the data field-wise in a vector of // type double. flddata.resize(ndata); nsize = 0; for ( ie = 0; ie < dom->nf; ie++ ) { for ( je = 0; je < fields[ie].dim; je++ ) { for ( ke = 0; ke < npoinLocal; ke++ ) { lnode = LocalNodes(ke); nN = dom->nN; val0 = dom->fld[ ie].data[ je * (nN-1) + lnode ]; flddata(nsize+je*(npoinLocal-1)+ke) = val0;} }
nsize += fields[ie].size; } //finally, send the data packets.blocking call // here. MPI_Send(&header(0),nheader,MPI_INT,iproc,1,comm); MPI_Send(&XLlnods(0),TotalProcElem,MPI_INT,iproc,2,comm); MPI_Send(&Llnods(0),ndimLnods,MPI_INT,iproc,3,comm); MPI_Send(&elt[0],TotalProcElem,elregion_type,iproc,4,comm); MPI_Send(&fields[0],dom->nf,tempfield_type,iproc,5,comm); MPI_Send(&flddata(0,ndata,MPI_DOUBLE,iproc,6,comm); // deallocate everything here.… // all structure and sv++ objects ….code deleted…. }
Object-oriented scientific software on the web Directories: www.cs.bham.ac.uk/~jdm/cpp.html oonumerics.org/oon Public Domain Finite Element/Finite Volume Software: MOUSE (finite volume, fire8.vug.uni-duisburg.de/MOUSE PETSc (www-fp.mcs.anl.gov/petsc) Diffpack (used to be free…now the group has started a company. www.nobjects.com/Diffpack slightly old freeware may still be on the web ) Overture (www.llnl.gov/casc/Overture ) FEMLib (quattro.me.uiuc.edu/~tiller/femlib.html …no longer supported ) Pfinics++ (contact abose@umich.edu, webpage under development) C++ Scientific Library (and other excellent software) www.netlib.org and nhse.cs.utk.edu
Graph Partitioning Tools: METIS, hMETIS and ParMETIS Objective: Given a connectivity graph of vertices and edges, we want to partition it into k equal-sized parts such that the number of interface edges is minimized. METIS and ParMETIS are very popular graph partitioning tools that : o can also produce fill-reducing orderings for sparse matrices o have some support for partitioning finite element/volume meshes o consistently produce better partitions than spectral partitioning algorithms (such as Chaco) o are very fast (Metis can partition a graph with 1 million vertices in 256 parts in less than 20 seconds on a typical desktop PC) For very large graphs (e.g. an automobile or an aircraft), the connectivity matrix can be very large – for these, one needs to solve the partitioning problem in parallel (ParMETIS) hMETIS is a serial hypergraph partitioning tool (e.g. can be used in partitioning large VLSI circuits) More information and software are available at: http://www-users.cs.umn.edu/~karypis/metis/
Examples of Mesh Partitioning Using METIS and ParMETIS (Courtesy Dr. Valmor de Almeida, Oak Ridge National Laboratory)
(Courtesy Dr. Valmor de Almeida, Oak Ridge National Laboratory) It is often necessary to reorder the partition numbers as well the element numbers within a partition
(Courtesy Dr. Valmor de Almeida, Oak Ridge National Laboratory)
Part II(a) : Programming Techniques for Mesh Partitioning Given: • global domain with a finite element or finite volume discretization, • associated physical properties, • weights to different parts of the domain Objective: • generate a partition of the global domain into a specified number of subdomains such that > all subdomains have roughly equal number of unknowns to compute > the communication time among neighboring subdomains is minimized
Part II(a) : Programming Techniques for Mesh Partitioning Four components: • global domain data structure (includes physical properties and a given discretization) : GlobalDomain.h • adjacency graph for the given finite element discretization (requires knowledge of specific element types) : Element.h (base class) • graph partitioning software such as Metis, Chaco • generation of detailed communication lists