1 / 37

An FPGA Implementation of the Smooth Particle Mesh Ewald Reciprocal Sum Compute Engine (RSCE)

An FPGA Implementation of the Smooth Particle Mesh Ewald Reciprocal Sum Compute Engine (RSCE) Sam Lee. What is this Thesis about?. Implementation Reciprocal Sum Compute Engine (RCSE). FPGA based. Accelerate part of Molecular Dynamics Sim. Smooth Particle Mesh Ewald. Investigation

miller
Download Presentation

An FPGA Implementation of the Smooth Particle Mesh Ewald Reciprocal Sum Compute Engine (RSCE)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An FPGA Implementation of the Smooth Particle Mesh Ewald Reciprocal Sum Compute Engine (RSCE) Sam Lee

  2. What is this Thesis about? • Implementation • Reciprocal Sum Compute Engine (RCSE). • FPGA based. • Accelerate part of Molecular Dynamics Sim. • Smooth Particle Mesh Ewald. • Investigation • Precision requirement. • Speedup capability. • Parallelization strategy.

  3. Outline • What is Molecular Dynamics Simulation? • What calculations are involved? • How do we accelerate and parallelize the calculations? • What did we find out about precision? • What did we find out about speedup? • What is left to be done?

  4. Molecular Dynamics Simulation

  5. Force by A on C 1+ 2- A C Force by C on A Force by C on B Force by A on B Force by B on A Force by B on C 1+ B Molecular Dynamics Simulation • E = - V (Electric Field = -Gradient of Potential) • F = QE (Force = Charge x Electric Field) • F = ma (Force = Mass x Acceleration) • Time integration => New Positions and Velocities ∆

  6. MD Simulation • Problem scientists are facing: • SLOW! • O(N^2). • N=105, time-span=1ns, timestep size=1fs => 1022 calculations. • An 3GHz computer takes 5.8 x 1012 days to finish!!

  7. Solution • Accelerate with FPGA • Especially: • The O(N2) calculations. • To be more specific, the thesis addresses: • Reciprocal Electrostatic energy and force calculations. • Smooth Particle Mesh Ewald algorithm.

  8. Previous Work • Software Implementations: • Original PME Package written by Toukmaji. • NAMD2. • AMBER. • Hardware Implementations: • No previous hardware implementations of SPME. • MD-Grape & MD-Engine used Ewald Summation. • Ewald Summation is O(N2); SPME is O(NLogN)!

  9. Smooth Particle Mesh Ewald Calculations Involved

  10. Electrostatic Interaction • Coulombic equation: • Under the Periodic Boundary Condition, summation is only … Conditionally Convergent.

  11. To combat Surface Effect… A B C 2 2 2 1 1 1 4 4 4 5 5 5 3 3 3 2 1 D E F 2 2 2 1 1 1 4 5 4 4 4 3 5 5 5 3 3 3 G H I 2 2 2 1 1 1 4 4 4 5 5 5 3 3 3 Periodic Boundary Condition Replication

  12. q q r q r Ewald Summation Used For PBC • To calculate for the Coulombic Interactions. • O(N2) Direct Sum + O(N2) Reciprocal Sum. Direct Sum Reciprocal Sum r

  13. Smooth Particle Mesh Ewald • Shift the workload to the Reciprocal Sum. • Use Fast Fourier Transform. • O(N) Real + O(NLogN) Reciprocal. • RSCE calculates the Reciprocal Sum using the SPME algorithm.

  14. SPME Reciprocal Energy FFT FFT

  15. SPME Reciprocal Force

  16. Reciprocal Sum Compute Engine(RSCE)

  17. RSCE Validation Environment

  18. RSCE Architecture

  19. RSCE Verification Testbench

  20. RSCE SystemC Model

  21. MD Simulations with theRSCE

  22. RSCE Precision Goal • Goal: Relative error < 10-5. • Two major calculation steps: • B-Spline Calculation. • 3D-FFT Calculation. • Due to limited logic resource + limited precision FFT LogiCore. => Precision goal CANNOT be achieved.

  23. MD Simulation with RSCE • RMS Energy Error Fluctuation:

  24. FFT Precision Vs. Energy Fluctuation

  25. RSCE vs. Software Implementation Speedup Analysis

  26. RSCE @ 100MHz vs. P4 Intel @ 2.4GHz. Speedup: 3x to 14x RSCE Computation time: RSCE Speedup

  27. RSCE Speedup • Why so insignificant? • QMM bandwidth limitation. • Sequential nature of the SPME algorithm. • Solution: • Use more QMM memories. • Slight design modifications required.

  28. Multi-QMM RSCE Speedup • NQ-QMM RSCE Computation time : • The 4-QMM RSCE • Speedup: 14x to 20x. • Assume N is of the same order as KxKxK: • Speedup: 3(NQ-1)x

  29. RSCE Speedup x

  30. When Multiple RSCEs are Used Together Parallelization Strategy

  31. RSCE Parallelization Strategy • Assume a 2-D Simulation. • Assume P=2, K=8, N=6. • Assume NumP = 4. An 8x8x8 mesh Four 4x4x4 Mini Meshes

  32. RSCE Parallelization Strategy • Mini-mesh composed -> 2D-IFFT • 2D-IFFT = two passes of 1D-FFT (X and Y). Y Direction FFT X Direction FFT

  33. Parallelization Strategy • 2D-IFFT -> Energy Calculation -> 2D-FFT • 2D-FFT -> Force Calculation Energy Calculation Force Calculation 2D-FFT

  34. Multi-RSCE System

  35. Conclusion • Successful integration of the RSCE into NAMD2. • Single-QMM RSCE Speedup = 3x to 14x. • NQ-QMM RSCE Speedup = 14x to 20x. • When N≈KxKxK, NQ-QMM Speedup = (NQ-1)3x. • Multi-RSCE system is still a better alternative than the Multi-FPGA Ewald Summation system.

  36. Future Work • Input Precision Analysis. • More in-depth FFT Precision Analysis. • Implementation of block-floating Point FFT. • More investigation on how different simulation setting (K, P, and N) affects the RSCE speedup. • Investigate how to better parallelize the SPME algorithm.

  37. Questions?

More Related