Parallelization of System Matrix generation code

Parallelization of System Matrix generation code Mahmoud Abdallah Antall Fernandes

SPECT System

Inverse Cone

Back Projection • Filtered Back Projection is applying a ramp filter on the back projected image. • Still widely used for its high speed and easy implementation. Ref figure: Tomographic Reconstruction of SPECT Data • Bill Amini, Magnus Björklund, Ron Dror, Anders Nygren oo

Maximum Likelihood-Expectation Maximization Algorithm • Is found to reduce noise in reconstruction iteratively • An iterative algorithm is used to solve the following linear problem • FX = P • P – vector of projection data • X – voxelized image • F – projection matrix operator • Needs a large number of iterations to reconstruct an image

EM Algorithm • The EM algorithm is given by • Summation over k is projection operation • Summation over j is the back projection operation

System Matrix • Maps the image space to the data space • Takes detector geometry as input • Generates detector data for every bin for each angle (usually there are 72 angles/frames)

System Matrix Algorithm for each angle DO // number of angles = 72 for each detector bin in U direction Do // bins: around 14 for each detector bin in V direction Do // bins: around 64 for each row in the inverse cone grid Do// <= 99 for each Column in the inverse cone grid Do //<= 99 for each voxel intersected the Ray Do calculate point response end end end end end end Number of loops = 72 x 14 x 64 x 99 x 99 = 632282112

System Matrix Parallelization Observation: At each angle, each bin’s calculations are independent from other bins’. Proposal: Parallelize all calculations for each angle. • E.g. use GPU.

System Matrix Parallelization on GPU

Parallelized System Matrix Algorithm Host Program: for each angle DO Run all kernels for all bins at the same time end GPU Kernel: for each voxel intersected the Ray Do calculate attenuation and store it in SysMat end

SIMD (Architecture of GPU) From: (AMD) Advanced Micro Devices INC 2010 (Introduction to OpenCL Programming)

OpenCL • Based on ISO C99 with some extensions & restrictions • provides parallel computing using task-based and data-based parallelism Architecture • Host Program • Kernel

Program Architecture Host Program Executes on the host system Sends kernels to execute on OpenCL™ devices using command queue. Kernels Similar to C function. Executed on OpenCL™ devices ( GPU).

Thank You

Parallelization of System Matrix generation code

Parallelization of System Matrix generation code

Presentation Transcript

Parallelization of a Non-Linear Analysis Code

Code Generation

Code generation

Code Generation

Parallelization of Expert System

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation

Code Generation