1 / 65

Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations

Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations. Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite 224 Bloomington IN gcf@indiana.edu. Abstract of Parallel Programming for Particle Dynamics.

luzwells
Download Presentation

Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Programming for Particle Dynamics and Systems of Ordinary Differential Equations Spring Semester 2005 Geoffrey Fox Community Grids Laboratory Indiana University 505 N Morton Suite 224 Bloomington IN gcf@indiana.edu jsunbody05 gcf@indiana.edu

  2. Abstract of Parallel Programming for Particle Dynamics • This uses the simple O(N2) Particle Dynamics Problem as a motivator to discuss solution of ordinary differential equations • We discuss Euler, Runge Kutta and predictor corrector methods • Various Message parallel O(N2) algorithms are described with performance comments • The initial discussion covers the type of problems that use these algorithms • Advanced topics include “fast multipole methods” and their application to earthquake simulations jsunbody05 gcf@indiana.edu

  3. Classes of Physical Simulations • Mathematical (Numerical) formulations of simulations fall into a few key classes which have their own distinctive algorithmic and parallelism issues • Most common formalism is that of a field theory where quantities of interest are represented by densities defined over a 1,2,3 or 4 dimensional space. • Such a description could be “fundamental” as in electromagnetism or relativity for gravitational field or “approximate” as in CFD where a fluid density averages over a particle description. • Our Laplace example is of this form where field  could either be fundamental (as in electrostatics) or approximate if comes from Euler equations for CFD jsunbody05 gcf@indiana.edu

  4. Applications reducing to Coupled set of Ordinary Differential Equations • Another set of models of physical systems represent them as coupled set of discrete entities evolving over time • Instead of (x,t) one gets i(t) labeled by an index i • Discretizing x in continuous case leads to discrete case but in many cases, discrete formulation is fundamental • Within coupled discrete system class, one has two important approaches • Classic time stepped simulations -- loop over all i at fixed t updating to • Discrete event simulations -- loop over all events representing changes of state of i(t) jsunbody05 gcf@indiana.edu

  5. Particle Dynamics or Equivalent Problems • Particles are sets of entities -- sometimes fixed (atoms in a crystal) or sometimes moving (galaxies in a universe) • They are characterized by force Fij on particle i due to particle j • Forces are characterized by their range r: Fij(xi,xj) is zero if distance |xi-xj| greater thanr • Examples: • The universe • A globular star cluster • The atoms in a crystal vibrating under interatomic forces • Molecules in a protein rotating and flexing under interatomic force • Laws of Motion are typically ordinary differential equations • Ordinary means differentiate wrt one variable -- typically time jsunbody05 gcf@indiana.edu

  6. Classes of Particle Problems • If the range r is small (as in a crystal), the one gets numerical formulations and parallel computing considerations similar to those in Laplace example with local communication • We showed in Laplace module that efficiency increases as range of force increases • If r is infinite ( no cut-off for force) as in gravitational problem, one finds rather different issues which we will discuss in this module • There are several “non-particle” problems discussed later that reduce to long range force problem characterized by every entity interacting with every other entity • Characterized by a calculation where updating entity i involves all other entities j jsunbody05 gcf@indiana.edu

  7. Circuit Simulations I • An electrical or electronic network has the same structure as a particle problem where “particles” are components (transistor, resistance, inductance etc.) and “force” between components i and j is nonzero if and only if i and j are linked in the circuit • For simulations of electrical transmission networks (the electrical grid), one would naturally use classic time stepped simulations updating each component i from state at time t to state at time t+t. • If one is simulating something like a chip, then time stepped approach is very wasteful as 99.99% of the components are doing nothing (i.e. remain in same state) at any given time step! • Here is where discrete event simulations (DES) are useful as one only computes where the action is • Biological simulations often are formulated as networks where each component (say a neuron or a cell) is described by an ODE and the network couples components jsunbody05 gcf@indiana.edu

  8. Circuit Simulations II • Discrete Event Simulations are clearly preferable on sequential machines but parallel algorithms are hard due to need for dynamic load balancing (events are dynamic and not uniform throughout system) and synchronization (which events can be executed in parallel?) • There are several important approaches to DES of which best known is Time Warp method originally proposed by David Jefferson -- here one optimistically executes events in parallel and rolls back to an earlier state if this is found to be inconsistent • Conservative methods (only execute those events you are certain cannot be impacted by earlier events) have little paralellism • e.g. there is only one event with lowest global time • DES do not exhibit the classic loosely synchronous compute-communicate structure as there is no uniform global time • typically even with time warp, no scalable parallelism jsunbody05 gcf@indiana.edu

  9. Discrete Event Simulations • Suppose we try to execute in parallel events E1 and E2 at times t1 and t2 with t1< t2. • We show the timelines of several(4) objects in the system and our two events E1 and E2 • If E1 generates no interfering events or one E*12 at a time greater than t2 then our parallel execution of E2 is consistent • However if E1 generates E12 before t2 then execution of E2 has to be rolled back and E12 should be executed first E1 E21 E11 Objects in System E22 E*12 E12 Time E2 jsunbody05 gcf@indiana.edu

  10. Matrices and Graphs I • Especially in cases where the “force” is linear in the i(t) , it is convenient to think of force being specified by a matrix M whose elementsmijare nonzero if and only if the force between i and j is nonzero. A typical force law is: Fi =  mij j(t) • In LaplaceEquation example, the matrix M is sparse ( most elements are zero) and this is a specially common case where one can and needs to develop efficient algorithms • We discuss in another talk the matrix formulation in the case of partial differential solvers jsunbody05 gcf@indiana.edu

  11. Matrices and Graphs II • Another way of looking at these problems is as graphs G where the nodes of the graphs are labeled by the particles i, and one has edges linking i to j if and only if the force Fij is non zero • In these languages, long range force problems correspond to dense matrix M (all elements nonzero) and fully connected graphs G 10 1 3 4 7 11 9 5 2 6 8 12 jsunbody05 gcf@indiana.edu

  12. Other N-Body Like Problems - I • The characteristic structure of N-body problem is an observable that depends on all pairs of entities from a set of N entities. • This structure is seen in diverse applications: • 1) Look at a database of items and calculate some form of correlation between all pairs of database entries • 2) This was first used in studies of measurements of a "chaotic dynamical system" with points xi which are vectors of length m Put rij = distance between xi and xj in m dimensional space Then probability p(rij = r) is proportional to r(d-1) • where d (not equal to m) is dynamical dimension of system • calculate by forming all the rij (for i and j running over observable points from our system -- usually a time series) and accumulating in a histogram of bins in r • Parallel algorithm in a nutshell: Store histograms replicated in all processors, distribute vectors equally in each processor and just pipeline xj through processors and as they pass through accumulate rij ; add histograms together at end. jsunbody05 gcf@indiana.edu

  13. Other N-Body Like Problems - II • 3) Green's Function Approach to simple Partial Differential equations gives solutions as integrals of known Green's functions times "source" or "boundary" terms. • For the simulation of earthquakes in GEM project the source terms are strains in faults and the stresses in any fault segment are the integral over strains in all other segments • Compared to particle dynamics, Force law replaced by Green's function but in each case total stress/Force is sum over contributions associated with other entities in formulation • 4) In the so called vortex method in CFD (Computational Fluid Dynamics) one models the Navier Stokes Equation as the long range interactions between entities which are the vortices • 5) Chemistry uses molecular dynamics and so particles are molecules but force is not Newton's laws usually but rather Van der Waals forces which are long range but fall off faster than 1/r2 jsunbody05 gcf@indiana.edu

  14. jsunbody05 gcf@indiana.edu

  15. jsunbody05 gcf@indiana.edu

  16. jsunbody05 gcf@indiana.edu

  17. jsunbody05 gcf@indiana.edu

  18. Euler’s Method for ODE’s Xi+1 1 = h f(ti,Xi) Xi = Yi Yi+1 ti ti+1 • Euler’s method involves a simple linear extrapolation to proceed from point to point with the ODE providing the slope • Gridti = t0 + h*iand X0 = Initial Value • Xi+1 = Xi + h * f(ti,Xi) • In practice not used as not accurate enough jsunbody05 gcf@indiana.edu

  19. Estimate of Local Error in Euler’s Method • Let X(ti) be numerical solution and Y(ti) the exact solution. We have Taylor Expansion • And by assumption X(ti) = Y(ti) and in Euler’s method we use exact derivative at ti. ThusAnd so X(ti+h) = Y(ti+h) + O(h2) and so local error is O(h2) • Accumulating this error over 1/h steps gives a global error of order h jsunbody05 gcf@indiana.edu

  20. Relationship of Error to Computational Approach • Whenever f satisfies certain smoothness conditions, there is always a sufficiently small step size h such that the difference between the real function value Yi+1 at ti+1 and the approximation Xi+1 is less than some required error magnitude . [e.g. Burden and Faires] • Euler's method: very quick as one computation of the derivative function f at each step. • Other methods require more computation per step size h but can produce the specified error  with fewer steps as can use a larger value of h. Thus Euler’s method is not best. jsunbody05 gcf@indiana.edu

  21. Simple Example of Euler’s Equation • The CSEP Book http://csep.hpcc.nectec.or.th/CSEP/ has lots of useful material. Consider simple example from there: • With the trivial and uninteresting (as step size too large) choice h=0.25, we get And so jsunbody05 gcf@indiana.edu

  22. jsunbody05 gcf@indiana.edu

  23. What’s wrong with Euler’s Method? Xi+1 -- Euler Tangent at Midpoint Exact Yi+1 Xi = Yi Tangent at Midpoint through ti ti ti+1 • To extrapolate from ti to ti+1 one should obviously best use (if just a straight line) the average slope (of tangent) and NOT the value at start • This immediately gives an error smaller by factorh but isn’t trivial because you don’t seem to know f(ti+0.5*h,X(ti+0.5*h)) jsunbody05 gcf@indiana.edu

  24. Runge-Kutta Methods: Modified Euler I • In the Runge Kutta methods, one uses intermediate values to calculate such midpoint derivatives • Key idea is that use an approximation forX(ti+0.5*h) as this is an argument of f which is multiplied by h. Thus error is (at least) one factor of h better than approximation • So if one wishes just to do one factor of h better than Euler, one can use Euler to estimate value of X(ti+0.5*h) jsunbody05 gcf@indiana.edu

  25. Runge-Kutta Methods: Modified Euler II Global Error is O(h2))Second Order Method • Midpoint Method is1=f(ti,X(ti))2=f(ti+0.5*h,X(ti+0.5*h))Xi+1 = Xi + h * 2 2Approximate Tangent at Midpoint 1 Tangentat ti Exact Yi+1 Midpoint Approximation Xi+1 Xi = Yi Approximate Tangent at Midpoint through ti ti ti+1 jsunbody05 gcf@indiana.edu

  26. jsunbody05 gcf@indiana.edu

  27. jsunbody05 gcf@indiana.edu

  28. jsunbody05 gcf@indiana.edu

  29. jsunbody05 gcf@indiana.edu

  30. Time Discretization • In ordinary differential equations, one focuses on different ways of represent di(t)/dt in some difference form such as Euler’s formdi(t)/dt = (i(t + t) - i(t)) / t • Or more accurate Runge-Kutta and other formulae • The same time discretization methods are applicable to (time derivatives in) partial differential equations involving time and are used to approximatei(x,t) / t jsunbody05 gcf@indiana.edu

  31. jsunbody05 gcf@indiana.edu

  32. jsunbody05 gcf@indiana.edu

  33. jsunbody05 gcf@indiana.edu

  34. The Mathematical Problem • We have 3 long vectors of size N • X = [X1, X2, X3, XN] • V = [V1, V2, V3, VN] • M = [M1, M2, M3, MN] is constant • Critical Equation isdVi / dt = Sum over all other particles j of the force due to j th particle which is (Xj– Xi)/(distance between i and j )3 which we term Grav(X,M) • We call the message parallel (MPI) implementation of the calculation of Grav the function MPGrav jsunbody05 gcf@indiana.edu

  35. The Problem • D(X,V)/dt = Function of X,V, M • Where each of X, V and M are N dimensional arrays • And there some scalars such as Gravitational Constant jsunbody05 gcf@indiana.edu

  36. Essential Ideas • The array size N is from 100,000 (Chemistry) to 100,000,000 (cosmology) • In simplest case of Euler’s method • X(t+dt) = X(t) + h V(t) • V(t+dt) = V(t) + h Grav(Vectors X and M) • Runge-Kutta is more complicated but this doesn’t effect essential issues for parallelism • The X equation needs to be calculated for each component i – it takes time of order N • The V equation also needs to be calculated for each component i but now each component is complicted and needs a computation itself of O(N) – total time is of order N2 and dominates jsunbody05 gcf@indiana.edu

  37. Form of parallel Computation • Computation of numerical method is inherently iterative: at each time step, the solution depends on the immediately preceding one. • At each time step, MPGrav is called (several times as using Runge Kutta): • For each particle i, one must sum over the forces due to all other particles j. • This requires communication as information about other particles are stored in other processors • We will use 4th order Runge Kutta to integrate in time and the program is designed around an overall routine looping over time with parallelism hidden in MPGrav routines and so identical between parallel and sequential versions jsunbody05 gcf@indiana.edu

  38. jsunbody05 gcf@indiana.edu

  39. Status of Parallelism in Various N Body Cases • Data Parallel approach is really only useful for the simple O(N2) case and even here it is quite tricky to express algorithm so that it is • both Space Efficient and • Captures factor of 2 savings from Newton's law of action and reaction Fij = - Fji • We have discussed these issues in a different foilset • The shared memory approach is effective for a modest number of processors in both algorithms. • It is only likely to scale properly for O(N2) case as the compiler will find it hard to capture clever ideas needed to make fast multipole efficient • Message Parallel approach gives you very efficient algorithms in both O(N2) and O(NlogN) • O(N2) case has very low communication • O(NlogN) has quite low communication jsunbody05 gcf@indiana.edu

  40. jsunbody05 gcf@indiana.edu

  41. jsunbody05 gcf@indiana.edu

  42. jsunbody05 gcf@indiana.edu

  43. jsunbody05 gcf@indiana.edu

  44. jsunbody05 gcf@indiana.edu

  45. jsunbody05 gcf@indiana.edu

  46. jsunbody05 gcf@indiana.edu

  47. jsunbody05 gcf@indiana.edu

  48. jsunbody05 gcf@indiana.edu

  49. jsunbody05 gcf@indiana.edu

  50. jsunbody05 gcf@indiana.edu

More Related