1 / 26

Ashok Srinivasan Florida State University cs.fsu/~asriniva

Long-Time Molecular Dynamics Simulations in Nano-Mechanics through Parallelization of the Time Domain. Ashok Srinivasan Florida State University http://www.cs.fsu.edu/~asriniva. Aim: Simulate for long time spans

emmett
Download Presentation

Ashok Srinivasan Florida State University cs.fsu/~asriniva

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Long-Time Molecular Dynamics Simulations in Nano-Mechanics through Parallelization of the Time Domain Ashok Srinivasan Florida State University http://www.cs.fsu.edu/~asriniva Aim:Simulate for long time spans Solution features: Use data from prior simulations to parallelize the time domain Acknowledgements: NSF, ORNL, NERSC, NCSA Collaborators: Yanan Yu and Namas Chandra

  2. Outline • Background • Limitations of Conventional Parallelization • Example Application: Carbon Nanotube Tensile Test • Small Time Step Size in Molecular Dynamics Simulations • Other Time Parallelization Approaches • Data-Driven Time Parallelization • Experimental Results • Scaled efficiently to ~ 1000 processors, for a problem where conventional parallelization scales to just 2-3 processors • Conclusions

  3. Background • Limitations of Conventional Parallelization • Example Application: Carbon Nanotube Tensile Test • Molecular Dynamics Simulations

  4. Limitations of Conventional Parallelization • Conventional parallelization decomposes the state space across processors • It is effective for large state space • It is not effective when computational effort arises from a large number of time steps • … or when granularity becomes very fine due to a large number of processors

  5. Example ApplicationCarbon Nanotube Tensile Test • Pull the CNT at a constant velocity • Determine stress-strain response and yield strain (when CNT starts breaking) using MD • Strain rate dependent

  6. A Drawback of Molecular Dynamics • Molecular dynamics • In each time step, forces of atoms on each other modeled using some potential • After force is computed, update positions • Repeat for desired number of time steps • Time steps size ~ 10 –15 seconds, due to physical and numerical considerations • Desired time range is much larger • A million time steps are required to reach 10-9 s • Around a day of computing for a 3000-atom CNT • MD uses unrealistically large strain-rates

  7. Other Time Parallelization Approaches • Waveform relaxation • Repeatedly solve for the entire time domain • Parallelizes well but convergence can be slow • Several variants to improve convergence • Parareal approach • Features similar to ours and to waveform relaxation • Precedes our approach • Not data-driven • Sequential phase for prediction • Not very effective in practice so far • Has much potential to be improved

  8. Waveform Relaxation • Special case: Picard iterations • Ex: dy/dt = y, y(0) = 1 becomes • dyn+1/dt = yn(t), y0(t) = 1 • In general • dy/dt = f(y,t), y(0) = y0 becomes • dyn+1/dt = g(yn, yn+1, t), y0(t) = y0 • g(u, u, t) = f(u, t) • g(yn, yn+1, t) = f(yn,t): Picard • g(yn, yn+1, t) = f(y,t): Converges in 1 iteration • Jacobi, Gauss-Seidel, and SOR versions of g defined • Many improvements • Ex: DIRM combines above with reduced order modeling Exact N = 1 N = 2 N = 3 N = 4

  9. Parareal approach • Based on an “approximate-verify-correct” sequence • An example of shooting methods for time-parallelization • Not shown to be effective in realistic situations Second prediction Initial computed result Correction Initial prediction

  10. Data-Driven Time Parallelization • Time Parallelization • Use Prior Data

  11. Time Parallelization • Each processor simulates a different time interval • Initial state is obtained by prediction, except for processor 0 • Verify if prediction for end state is close to that computed by MD • Prediction is based on dynamically determining a relationship between the current simulation and those in a database of prior results If time interval is sufficiently large, then communication overhead is small

  12. Problems with multiple time-scales • Fine-scale computations (such as MD) are more accurate, but more time consuming • Much of the details at the finer scale are unimportant, but some are A simple schematic of multiple time scales

  13. Use Prior Data • Results for identical simulation exists • Retrieve the results • Results for slightly different parameter, with the same coarse-scale response exists • Retrieve the results • Verify closeness, or pre-determine acceptable parameter range • Current simulation behaves like different prior ones at different times • Identify similar prior results, learn relationship, verify prediction • Not similar to prior results • Try to identify coarse-scale behavior, apply dynamic iterations to improve on predictions

  14. Experimental Results • CNT tensile test • CNT identical to prior results, but different strain-rate • 1000-atoms CNT, 300 K • Static and dynamic prediction • CNT identical to prior results, but different strain-rate and temperature • CNT differs in size from prior result, and simulated with a different strain-rate

  15. Dimensionality Reduction • Movement of atoms in a 1000-atom CNT can be considered the motion of a point in 3000-dimensional space • Find a lower dimensional subspace close to which the points lie • We use principal orthogonal decomposition • Find a low dimensional affine subspace • Motion may, however, be complex in this subspace • Use results for different strain rates • Velocity = 10m/s, 5m/s, and 1 m/s • At five different time points • [U, S, V] = svd(Shifted Data) • Shifted Data = U*S*VT • States of CNT expressed as • m + c1 u1 + c2 u2 u u m

  16. Basis Vectors from POD • CNT of ~ 100 A with 1000 atoms at 300 K Blue: z Green, Red: x, y u1 (blue) and u2 (red) for z u1 (green) for x is not “significant”

  17. Relate strain rate and time • Coefficients of u1 • Blue: 1m/s • Red: 5 m/s • Green: 10m/s • Dotted line: same strain • Suggests that behavior is similar at similar strains • In general, clustering similar coefficients can give parameter-time relationships

  18. Prediction When v is the only parameter • Static Predictor • Independently predict change in each coordinate • Use precomputed results for 40 different time points each for three different velocities • To predict for (t; v) not in the database • Determine coefficients for nearby v at nearby strains • Fit a linear surface and interpolate/extrapolate to get coefficients c1 and c2 for (t; v) • Get state as m + c1 u1 + c2 u2 • Dynamic Prediction • Correct the above coefficients, by determining the error between the previously predicted and computed states Green: 10 m/s, Red: 5 m/s, Blue: 1 m/s,Magenta: 0.1 m/s,Black: 0.1m/s through direct prediction

  19. Verification of prediction • Definition of equivalence of two states • Atoms vibrate around their mean position • Consider states equivalent if difference in position, potential energy, and temperature are within the normal range of fluctuations • Max displacement ~= 0.2 A • Mean displacement ~= 0.08 A • Potential energy fluctuation = 0.35% • Temperature fluctuation = 12.5 K Displacement (from mean) Mean position

  20. Stress-strain response at 0.1 m/s • Blue: Exact result • Green: Direct prediction with interpolation / extrapolation • Points close to yield involve extrapolation in velocity and strain • Red: Time parallel results

  21. Speedup • Red line: Ideal speedup • Blue: v = 0.1m/s • Green: A different predictor • v = 1m/s, using v = 10m/s • CNT with 1000 atoms • Xeon/ Myrinet cluster

  22. Temperature and velocity vary • Use 1000-atom CNT results • Temperatures: 300K, 600K, 900K, 1200K • Velocities: 1m/s, 5m/s, 10m/s • Dynamically choose closest simulation for prediction Speedup __ 450K, 2m/s … Linear Stress-strain Blue: Exact 450K Red: 200 processors

  23. CNTs of varying sizes • Use a 1000-atom CNT, 10 m/s, 300K result • Parallelize 1200, 1600, 2000-atom CNT runs • Observe that the dominant mode is approximately a linear function of the initial z-coordinate • Normalize coordinates to be in [0,1] • z t+Dt = z t+ z’t+DtDt, predict z’ • Speedup • - 2000 atoms • .- 1600 atoms • __ 1200 atoms • … Linear • Stress-strain • Blue: Exact 2000 atoms, 1m/s • Red: 200 processors

  24. Predict change in coordinates • Express x’ in terms of basis functions • Example: • x’ t+Dt = a0, t+Dt + a1, t+Dt x t • a0, t+Dt, a1, t+Dt are unknown • Express changes, y, for the base (old) simulation similarly, in terms of coefficients b and perform least squares fit • Predict ai, t+Dt as bi, t+Dt + R t+Dt • R t+Dt = (1-b)R t + b(ai, t- bi, t) • Intuitively, the difference between the base coefficient and the current coefficient is predicted as a weighted combination of previous weights • We use b = 0.5 • Gives more weight to latest results • Does not let random fluctuations affect the predictor too much • Velocity estimated as latest accurate results known

  25. Conclusions • Data-driven time parallelization shows significant improvement in speed, without sacrificing accuracy significantly, in the CNT tensile test • The 980-processor simulation attained a flop rate of ~ 420 Gflops • Its flops per atom rate of 420 Mflops/atom is likely the largest flop per atom rate in classical MD simulations • References • See http://www.cs.fsu.edu/~asriniva/research.html

  26. Future Work • More complex problems • Better prediction • POD is good for representing data, but not necessarily for identifying patterns • Use better dimensionality reduction / reduced order modeling techniques • Better learning • Better verification

More Related