70 likes | 179 Views
Gravitational N-body Simulation. Major Design Goals Efficiency Versatility (ability to use different numerical methods) Scalability Lesser Design Goals Flexibility (control parameters must be configurable) Persistence (pause and continue) Visualization. Hardware.
E N D
Gravitational N-body Simulation Major Design Goals Efficiency Versatility (ability to use different numerical methods) Scalability Lesser Design Goals Flexibility (control parameters must be configurable) Persistence (pause and continue) Visualization
Hardware 6 GFlops average desktop 256 GFlops top-line server Single Computer Configuration 1-4 CPUs 1-4 Cores 3-4 GHz CPUs 2-4 32-bit FP IPC 1-2 64-bit FP IPC Windows Cluster Configurations http://gears.aset.psu.edu/hpc/systems/ -LION-XO (80x2xOpteron/8GB + 40x4xOpteron/16GB; 2.4 GHz) -1.6 TFlops (32-bit); 800 GFlops (64-bit); single-core assumed -Gigabit Ethernet GNU/Linux Single or dual core CPUs? CPU Model?
Algorithms Direct Methods: O(N2) + very simple + scalable inefficient (~30,000 particles max @ 256 GFlops) Treecode / Mutipole: O(NlogN) more difficult to implement scalability harder to achieve + efficient (106-1010 particles) Field Methods: O(NlogN) or O(N) Involves solving Poisson’s equation Area of active research
Levels of Parallelization 1) SIMD: up to 4 threads -4x32-bit flops/cycle -2x64-bit flops/cycle 2) SMP/MPU: up to 4 threads -1-4 cores -1-4 CPUs 3) Cluster: up to N nodes
Memory Requirements Position: x, y, z Velocity: vx, vy, vz 6x4 = 24 bytes (32-bit fp) 6x8 = 48 bytes (64-bit fp) 2,500 points per KB (32-bit) 1,300 points per KB (64-bit)
Levels of Memory 1) L1 cache: 64 KB -CPU clock-speed -no latency 2) L2 cache: 1 MB -CPU clock-speed -low latency 3) RAM: GBs -reduced speed (up to 12-24GB/s) -huge latency 4) Network (weakest link) -1 Gbit/sec
109 Particles Require… Memory: 24 GB (32-bit) Instructions per iteration: Log2(109)x109xconst~3x1012ops=3TFlops Time: ~12 sec @ 256 GFlops