650 likes | 664 Views
Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations. Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite 224 Bloomington IN gcf@indiana.edu. Abstract of Parallel Programming for Particle Dynamics.
E N D
Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite 224 Bloomington IN gcf@indiana.edu jsunbody05 gcf@indiana.edu
Abstract of Parallel Programming for Particle Dynamics • This uses the simple O(N2) Particle Dynamics Problem as a motivator to discuss solution of ordinary differential equations • We discuss Euler, Runge Kutta and predictor corrector methods • Various Message parallel O(N2) algorithms are described with performance comments • The initial discussion covers the type of problems that use these algorithms • Advanced topics include “fast multipole methods” and their application to earthquake simulations jsunbody05 gcf@indiana.edu
Classes of Physical Simulations • Mathematical (Numerical) formulations of simulations fall into a few key classes which have their own distinctive algorithmic and parallelism issues • Most common formalism is that of a field theory where quantities of interest are represented by densities defined over a 1,2,3 or 4 dimensional space. • Such a description could be “fundamental” as in electromagnetism or relativity for gravitational field or “approximate” as in CFD where a fluid density averages over a particle description. • Our Laplace example is of this form where field could either be fundamental (as in electrostatics) or approximate if comes from Euler equations for CFD jsunbody05 gcf@indiana.edu
Applications reducing to Coupled set of Ordinary Differential Equations • Another set of models of physical systems represent them as coupled set of discrete entities evolving over time • Instead of (x,t) one gets i(t) labeled by an index i • Discretizing x in continuous case leads to discrete case but in many cases, discrete formulation is fundamental • Within coupled discrete system class, one has two important approaches • Classic time stepped simulations -- loop over all i at fixed t updating to • Discrete event simulations -- loop over all events representing changes of state of i(t) jsunbody05 gcf@indiana.edu
Particle Dynamics or Equivalent Problems • Particles are sets of entities -- sometimes fixed (atoms in a crystal) or sometimes moving (galaxies in a universe) • They are characterized by force Fij on particle i due to particle j • Forces are characterized by their range r: Fij(xi,xj) is zero if distance |xi-xj| greater thanr • Examples: • The universe • A globular star cluster • The atoms in a crystal vibrating under interatomic forces • Molecules in a protein rotating and flexing under interatomic force • Laws of Motion are typically ordinary differential equations • Ordinary means differentiate wrt one variable -- typically time jsunbody05 gcf@indiana.edu
Classes of Particle Problems • If the range r is small (as in a crystal), the one gets numerical formulations and parallel computing considerations similar to those in Laplace example with local communication • We showed in Laplace module that efficiency increases as range of force increases • If r is infinite ( no cut-off for force) as in gravitational problem, one finds rather different issues which we will discuss in this module • There are several “non-particle” problems discussed later that reduce to long range force problem characterized by every entity interacting with every other entity • Characterized by a calculation where updating entity i involves all other entities j jsunbody05 gcf@indiana.edu
Circuit Simulations I • An electrical or electronic network has the same structure as a particle problem where “particles” are components (transistor, resistance, inductance etc.) and “force” between components i and j is nonzero if and only if i and j are linked in the circuit • For simulations of electrical transmission networks (the electrical grid), one would naturally use classic time stepped simulations updating each component i from state at time t to state at time t+t. • If one is simulating something like a chip, then time stepped approach is very wasteful as 99.99% of the components are doing nothing (i.e. remain in same state) at any given time step! • Here is where discrete event simulations (DES) are useful as one only computes where the action is • Biological simulations often are formulated as networks where each component (say a neuron or a cell) is described by an ODE and the network couples components jsunbody05 gcf@indiana.edu
Circuit Simulations II • Discrete Event Simulations are clearly preferable on sequential machines but parallel algorithms are hard due to need for dynamic load balancing (events are dynamic and not uniform throughout system) and synchronization (which events can be executed in parallel?) • There are several important approaches to DES of which best known is Time Warp method originally proposed by David Jefferson -- here one optimistically executes events in parallel and rolls back to an earlier state if this is found to be inconsistent • Conservative methods (only execute those events you are certain cannot be impacted by earlier events) have little paralellism • e.g. there is only one event with lowest global time • DES do not exhibit the classic loosely synchronous compute-communicate structure as there is no uniform global time • typically even with time warp, no scalable parallelism jsunbody05 gcf@indiana.edu
Discrete Event Simulations • Suppose we try to execute in parallel events E1 and E2 at times t1 and t2 with t1< t2. • We show the timelines of several(4) objects in the system and our two events E1 and E2 • If E1 generates no interfering events or one E*12 at a time greater than t2 then our parallel execution of E2 is consistent • However if E1 generates E12 before t2 then execution of E2 has to be rolled back and E12 should be executed first E1 E21 E11 Objects in System E22 E*12 E12 Time E2 jsunbody05 gcf@indiana.edu
Matrices and Graphs I • Especially in cases where the “force” is linear in the i(t) , it is convenient to think of force being specified by a matrix M whose elementsmijare nonzero if and only if the force between i and j is nonzero. A typical force law is: Fi = mij j(t) • In LaplaceEquation example, the matrix M is sparse ( most elements are zero) and this is a specially common case where one can and needs to develop efficient algorithms • We discuss in another talk the matrix formulation in the case of partial differential solvers jsunbody05 gcf@indiana.edu
Matrices and Graphs II • Another way of looking at these problems is as graphs G where the nodes of the graphs are labeled by the particles i, and one has edges linking i to j if and only if the force Fij is non zero • In these languages, long range force problems correspond to dense matrix M (all elements nonzero) and fully connected graphs G 10 1 3 4 7 11 9 5 2 6 8 12 jsunbody05 gcf@indiana.edu
Other N-Body Like Problems - I • The characteristic structure of N-body problem is an observable that depends on all pairs of entities from a set of N entities. • This structure is seen in diverse applications: • 1) Look at a database of items and calculate some form of correlation between all pairs of database entries • 2) This was first used in studies of measurements of a "chaotic dynamical system" with points xi which are vectors of length m Put rij = distance between xi and xj in m dimensional space Then probability p(rij = r) is proportional to r(d-1) • where d (not equal to m) is dynamical dimension of system • calculate by forming all the rij (for i and j running over observable points from our system -- usually a time series) and accumulating in a histogram of bins in r • Parallel algorithm in a nutshell: Store histograms replicated in all processors, distribute vectors equally in each processor and just pipeline xj through processors and as they pass through accumulate rij ; add histograms together at end. jsunbody05 gcf@indiana.edu
Other N-Body Like Problems - II • 3) Green's Function Approach to simple Partial Differential equations gives solutions as integrals of known Green's functions times "source" or "boundary" terms. • For the simulation of earthquakes in GEM project the source terms are strains in faults and the stresses in any fault segment are the integral over strains in all other segments • Compared to particle dynamics, Force law replaced by Green's function but in each case total stress/Force is sum over contributions associated with other entities in formulation • 4) In the so called vortex method in CFD (Computational Fluid Dynamics) one models the Navier Stokes Equation as the long range interactions between entities which are the vortices • 5) Chemistry uses molecular dynamics and so particles are molecules but force is not Newton's laws usually but rather Van der Waals forces which are long range but fall off faster than 1/r2 jsunbody05 gcf@indiana.edu
Euler’s Method for ODE’s Xi+1 1 = h f(ti,Xi) Xi = Yi Yi+1 ti ti+1 • Euler’s method involves a simple linear extrapolation to proceed from point to point with the ODE providing the slope • Gridti = t0 + h*iand X0 = Initial Value • Xi+1 = Xi + h * f(ti,Xi) • In practice not used as not accurate enough jsunbody05 gcf@indiana.edu
Estimate of Local Error in Euler’s Method • Let X(ti) be numerical solution and Y(ti) the exact solution. We have Taylor Expansion • And by assumption X(ti) = Y(ti) and in Euler’s method we use exact derivative at ti. ThusAnd so X(ti+h) = Y(ti+h) + O(h2) and so local error is O(h2) • Accumulating this error over 1/h steps gives a global error of order h jsunbody05 gcf@indiana.edu
Relationship of Error to Computational Approach • Whenever f satisfies certain smoothness conditions, there is always a sufficiently small step size h such that the difference between the real function value Yi+1 at ti+1 and the approximation Xi+1 is less than some required error magnitude . [e.g. Burden and Faires] • Euler's method: very quick as one computation of the derivative function f at each step. • Other methods require more computation per step size h but can produce the specified error with fewer steps as can use a larger value of h. Thus Euler’s method is not best. jsunbody05 gcf@indiana.edu
Simple Example of Euler’s Equation • The CSEP Book http://csep.hpcc.nectec.or.th/CSEP/ has lots of useful material. Consider simple example from there: • With the trivial and uninteresting (as step size too large) choice h=0.25, we get And so jsunbody05 gcf@indiana.edu
What’s wrong with Euler’s Method? Xi+1 -- Euler Tangent at Midpoint Exact Yi+1 Xi = Yi Tangent at Midpoint through ti ti ti+1 • To extrapolate from ti to ti+1 one should obviously best use (if just a straight line) the average slope (of tangent) and NOT the value at start • This immediately gives an error smaller by factorh but isn’t trivial because you don’t seem to know f(ti+0.5*h,X(ti+0.5*h)) jsunbody05 gcf@indiana.edu
Runge-Kutta Methods: Modified Euler I • In the Runge Kutta methods, one uses intermediate values to calculate such midpoint derivatives • Key idea is that use an approximation forX(ti+0.5*h) as this is an argument of f which is multiplied by h. Thus error is (at least) one factor of h better than approximation • So if one wishes just to do one factor of h better than Euler, one can use Euler to estimate value of X(ti+0.5*h) jsunbody05 gcf@indiana.edu
Runge-Kutta Methods: Modified Euler II Global Error is O(h2))Second Order Method • Midpoint Method is1=f(ti,X(ti))2=f(ti+0.5*h,X(ti+0.5*h))Xi+1 = Xi + h * 2 2Approximate Tangent at Midpoint 1 Tangentat ti Exact Yi+1 Midpoint Approximation Xi+1 Xi = Yi Approximate Tangent at Midpoint through ti ti ti+1 jsunbody05 gcf@indiana.edu
Time Discretization • In ordinary differential equations, one focuses on different ways of represent di(t)/dt in some difference form such as Euler’s formdi(t)/dt = (i(t + t) - i(t)) / t • Or more accurate Runge-Kutta and other formulae • The same time discretization methods are applicable to (time derivatives in) partial differential equations involving time and are used to approximatei(x,t) / t jsunbody05 gcf@indiana.edu
The Mathematical Problem • We have 3 long vectors of size N • X = [X1, X2, X3, XN] • V = [V1, V2, V3, VN] • M = [M1, M2, M3, MN] is constant • Critical Equation isdVi / dt = Sum over all other particles j of the force due to j th particle which is (Xj– Xi)/(distance between i and j )3 which we term Grav(X,M) • We call the message parallel (MPI) implementation of the calculation of Grav the function MPGrav jsunbody05 gcf@indiana.edu
The Problem • D(X,V)/dt = Function of X,V, M • Where each of X, V and M are N dimensional arrays • And there some scalars such as Gravitational Constant jsunbody05 gcf@indiana.edu
Essential Ideas • The array size N is from 100,000 (Chemistry) to 100,000,000 (cosmology) • In simplest case of Euler’s method • X(t+dt) = X(t) + h V(t) • V(t+dt) = V(t) + h Grav(Vectors X and M) • Runge-Kutta is more complicated but this doesn’t effect essential issues for parallelism • The X equation needs to be calculated for each component i – it takes time of order N • The V equation also needs to be calculated for each component i but now each component is complicted and needs a computation itself of O(N) – total time is of order N2 and dominates jsunbody05 gcf@indiana.edu
Form of parallel Computation • Computation of numerical method is inherently iterative: at each time step, the solution depends on the immediately preceding one. • At each time step, MPGrav is called (several times as using Runge Kutta): • For each particle i, one must sum over the forces due to all other particles j. • This requires communication as information about other particles are stored in other processors • We will use 4th order Runge Kutta to integrate in time and the program is designed around an overall routine looping over time with parallelism hidden in MPGrav routines and so identical between parallel and sequential versions jsunbody05 gcf@indiana.edu
Status of Parallelism in Various N Body Cases • Data Parallel approach is really only useful for the simple O(N2) case and even here it is quite tricky to express algorithm so that it is • both Space Efficient and • Captures factor of 2 savings from Newton's law of action and reaction Fij = - Fji • We have discussed these issues in a different foilset • The shared memory approach is effective for a modest number of processors in both algorithms. • It is only likely to scale properly for O(N2) case as the compiler will find it hard to capture clever ideas needed to make fast multipole efficient • Message Parallel approach gives you very efficient algorithms in both O(N2) and O(NlogN) • O(N2) case has very low communication • O(NlogN) has quite low communication jsunbody05 gcf@indiana.edu