120 likes | 239 Views
“Cellular Automata using nVIDIA CUDA” and “Bridging the Gap between MPJExpress and CUDA”. End Semester Project for course Parallel Computing. Team members: Bibrak Qamar NUST-2007-BIT9-105 Jahanzaib Maqbool NUST-2007-BIT9-118 Bilawal Sarwar NUST-2007-BIT9-11
E N D
“Cellular Automata using nVIDIA CUDA” and “Bridging the Gap between MPJExpress and CUDA” End Semester Project for course Parallel Computing Team members: BibrakQamar NUST-2007-BIT9-105 JahanzaibMaqbool NUST-2007-BIT9-118 BilawalSarwar NUST-2007-BIT9-11 Muhammad Imran NUST-2007-BIT9-127 MahreenNadeem NUST-2007-BIT9-
System Specification • Name CUDA-TESTBED • Processor: Intel(R) Xeon(R) CPU W3520 @2.67GHz, Core2 Quad • Physical Threads per core = 2 • Cores = 4 • GPU : 2 NVIDIA GTX 285 • Memory = 8 GB
NVIDIA GTX 285 • GPU Engine Specs: • CUDA Cores : 240 • Graphics Clock : 648 MHz “The shader clock” • Processor Clock :1476 MHz “Hot clock” • Memory Specs: • Memory Clock :1242 MHz • Standard Memory : 1GB GDDR3 • Memory Interface Width : 512-bit • Memory Bandwidth : 159.0 GB/sec
Implementation • Game of Life on CUDA • Fish and Shark on CUDA • Matrix Multiplication on GPU Accelerated Cluster using MPJExpress
Cellular AutomataFish and Shark Execution Flow • Initialize device • Allocate Device and Host side memory • Populate cells • Copy From Host to Device • Loop in Display() • Draw cells • Execute Kernel • Copy result back to Host • End loop • Free memory • End program
Kernel function • Get ThreadID.X and ThreadID.Y • Fetch neighbors' • Decide Fate • Write result to resultant Cellular board
Execution Graph Max Global Memory Throughput we achieved was = 95 GB/s
Speedup against sequential CPU version Average Speedup = 878.91 X
Matrix Multiplication on GPU Accelerated Cluster using MPJExpress • Algorithm • Use MPJExpress to distribute Data. • Call cudaMatMultiply function • Allocate device memory • Execute Kernel • Copy results back • Gather results at root www.Jcuda.org We have used JCUDA, Java binding for NVIDIA CUDA