390 likes | 565 Views
An FPGA Implementation of the Smooth Particle Mesh Ewald Reciprocal Sum Compute Engine (RSCE) Sam Lee. What is this Thesis about?. Implementation Reciprocal Sum Compute Engine (RCSE). FPGA based. Accelerate part of Molecular Dynamics Sim. Smooth Particle Mesh Ewald. Investigation
E N D
An FPGA Implementation of the Smooth Particle Mesh Ewald Reciprocal Sum Compute Engine (RSCE) Sam Lee
What is this Thesis about? • Implementation • Reciprocal Sum Compute Engine (RCSE). • FPGA based. • Accelerate part of Molecular Dynamics Sim. • Smooth Particle Mesh Ewald. • Investigation • Precision requirement. • Speedup capability. • Parallelization strategy.
Outline • What is Molecular Dynamics Simulation? • What calculations are involved? • How do we accelerate and parallelize the calculations? • What did we find out about precision? • What did we find out about speedup? • What is left to be done?
Force by A on C 1+ 2- A C Force by C on A Force by C on B Force by A on B Force by B on A Force by B on C 1+ B Molecular Dynamics Simulation • E = - V (Electric Field = -Gradient of Potential) • F = QE (Force = Charge x Electric Field) • F = ma (Force = Mass x Acceleration) • Time integration => New Positions and Velocities ∆
MD Simulation • Problem scientists are facing: • SLOW! • O(N^2). • N=105, time-span=1ns, timestep size=1fs => 1022 calculations. • An 3GHz computer takes 5.8 x 1012 days to finish!!
Solution • Accelerate with FPGA • Especially: • The O(N2) calculations. • To be more specific, the thesis addresses: • Reciprocal Electrostatic energy and force calculations. • Smooth Particle Mesh Ewald algorithm.
Previous Work • Software Implementations: • Original PME Package written by Toukmaji. • NAMD2. • AMBER. • Hardware Implementations: • No previous hardware implementations of SPME. • MD-Grape & MD-Engine used Ewald Summation. • Ewald Summation is O(N2); SPME is O(NLogN)!
Smooth Particle Mesh Ewald Calculations Involved
Electrostatic Interaction • Coulombic equation: • Under the Periodic Boundary Condition, summation is only … Conditionally Convergent.
To combat Surface Effect… A B C 2 2 2 1 1 1 4 4 4 5 5 5 3 3 3 2 1 D E F 2 2 2 1 1 1 4 5 4 4 4 3 5 5 5 3 3 3 G H I 2 2 2 1 1 1 4 4 4 5 5 5 3 3 3 Periodic Boundary Condition Replication
q q r q r Ewald Summation Used For PBC • To calculate for the Coulombic Interactions. • O(N2) Direct Sum + O(N2) Reciprocal Sum. Direct Sum Reciprocal Sum r
Smooth Particle Mesh Ewald • Shift the workload to the Reciprocal Sum. • Use Fast Fourier Transform. • O(N) Real + O(NLogN) Reciprocal. • RSCE calculates the Reciprocal Sum using the SPME algorithm.
SPME Reciprocal Energy FFT FFT
RSCE Precision Goal • Goal: Relative error < 10-5. • Two major calculation steps: • B-Spline Calculation. • 3D-FFT Calculation. • Due to limited logic resource + limited precision FFT LogiCore. => Precision goal CANNOT be achieved.
MD Simulation with RSCE • RMS Energy Error Fluctuation:
RSCE vs. Software Implementation Speedup Analysis
RSCE @ 100MHz vs. P4 Intel @ 2.4GHz. Speedup: 3x to 14x RSCE Computation time: RSCE Speedup
RSCE Speedup • Why so insignificant? • QMM bandwidth limitation. • Sequential nature of the SPME algorithm. • Solution: • Use more QMM memories. • Slight design modifications required.
Multi-QMM RSCE Speedup • NQ-QMM RSCE Computation time : • The 4-QMM RSCE • Speedup: 14x to 20x. • Assume N is of the same order as KxKxK: • Speedup: 3(NQ-1)x
When Multiple RSCEs are Used Together Parallelization Strategy
RSCE Parallelization Strategy • Assume a 2-D Simulation. • Assume P=2, K=8, N=6. • Assume NumP = 4. An 8x8x8 mesh Four 4x4x4 Mini Meshes
RSCE Parallelization Strategy • Mini-mesh composed -> 2D-IFFT • 2D-IFFT = two passes of 1D-FFT (X and Y). Y Direction FFT X Direction FFT
Parallelization Strategy • 2D-IFFT -> Energy Calculation -> 2D-FFT • 2D-FFT -> Force Calculation Energy Calculation Force Calculation 2D-FFT
Conclusion • Successful integration of the RSCE into NAMD2. • Single-QMM RSCE Speedup = 3x to 14x. • NQ-QMM RSCE Speedup = 14x to 20x. • When N≈KxKxK, NQ-QMM Speedup = (NQ-1)3x. • Multi-RSCE system is still a better alternative than the Multi-FPGA Ewald Summation system.
Future Work • Input Precision Analysis. • More in-depth FFT Precision Analysis. • Implementation of block-floating Point FFT. • More investigation on how different simulation setting (K, P, and N) affects the RSCE speedup. • Investigate how to better parallelize the SPME algorithm.