140 likes | 292 Views
Image Reconstruction on Multicore Processors. Graduate Students Eric Fontaine and Viraj Paropkari Faculty Members: Ada Gavrilovska and Hsien-Hsin S. Lee. Agenda. Background FDK algorithm Overview Parallelization Method Current Results Katsevich Algorithm Overview
E N D
Image Reconstruction on Multicore Processors Graduate Students Eric Fontaine and Viraj Paropkari Faculty Members: Ada Gavrilovska and Hsien-Hsin S. Lee
Agenda • Background • FDK algorithm • Overview • Parallelization Method • Current Results • Katsevich Algorithm • Overview • Parallelization Method • Current Results • Future Plans
Background • Use 3-D CT scan to identify tumors and other defects inside the body. • Two common methods • MRI • Complex math and physics • Main function ─ Simple IFFT • Filtered back-projection • Two common filtered back-projection algorithms • FDK • Approximation, fast • Use projections taken on a circular path surrounding the object • More accurate on the plane containing the circle • Katsevich • More accurate, but also more compute-intensive • Use projections taken on a helical path surrounding the object • It can reconstruct long objects, unlike the original FDK. • Both contain large data parallelism
FDK Algorithm Overview • Cone beam image reconstruction with source on a helix for a flat detector • Reconstruction for 3-D volume • Initialize the helix source parameters • Compute/load cone beam data • Length correction weighting • 1-D horizontal filtering • Linear Pre-interpolation • Back projection • Compare Results with standard phantom
Parallelization Strategy • Based on FDK algorithm for general scanning paths like helix.* • Each thread is assigned a subset of the total number of projections, and performs length correction weighting, filtering and back-projections of its assigned projections. • After all threads are done, there is an implicit barrier necessary for synchronization. Then each thread is assigned a subset of the total volume to reconstruct. • We use OpenMP • Reconstruct subsets of the total volume in parallel (to fit into individual cache) • Piece the image together at the end (reduced inter-core communication) Length correction weighting, filtering, back-projection Assign Projections barrier Reconstructed Image Length correction weighting, filtering, back-projection *Ge Wang, Tein-Hsiang Lin, Ping-chin Cheng, and Douglas M. Shinozaki. A general cone-beam reconstruction algorithm. IEEE Trans. On Medical Imaging, 12(3):486-496, September 1993
Slowdown Single and Dual-Thread Performance Speedup of dual-thread OpenMP code Performance (Seconds)
FDK Analysis for Memory Behavior Statistics of Single Thread Statistics of Two Threads
Katsevich Algorithm Overview • Reconstructs a 3-D cylindrical volume exactly from 2-D projections.[1] • The inputs are projections (b) taken from a helical path surrounding the volume of interest (a). • Implemented the Noo method [2]: • These projections are differentiated and weighted appropriately (c). • These undergo a 1-D Hilbert transform along the κ-lines. • First undergo remapping to κ-line coordinates (d). • Perform 1-D convolution w/ filter kernel (e). • Return to projection coordinates by remapping (f). • To reconstruct the 3-D volume (g), each voxel’s coordinates is back projected the source projections • The cumulative sum is taken for all projections belonging to the PI-interval containing that voxel. • Used similar parallelization strategy to FDK • Each thread processes a subset of the projections. • After synchronization, each thread reconstructs a subset of the total volume. (a) (b) (c) (d) (e) (f) (g) [1] Alexander Katsevich, "Theoretically exact FBP-type inversion algorithm for spiral CT", Society for Industrial and Applied Mathematics Journal on Applied Mathematics, 62:2012-2026, 2002. [2] F. Noo, J. Pack, and D. Heuscher, “Exact helical reconstruction using native cone-beam geometries,” Physics in Medicine and Biology, vol. 48, pp. 3787–3818, 2003.
Results • Using Intel Core2 Duo @ 2.66 GHz. • Close to 2x speedup
Image Quality 512^3 Reconstruction 512 Projections per Turn, 512x64 size projections 512^3 original Phantom
Benchmark • Compared against the published timing results in [3], which used 64-bit AMD Opteron processors. • Unable to determine exact parameters used by author of [3], so the comparison may be questionable. [3] Deng, J., Yu, H., Ni, J., He, T., Zhao, S., Wang, L., and Wang, G. 2006. A Parallel Implementation of the Katsevich Algorithm for 3-D CT Image Reconstruction. J. Supercomput. 38, 1 (Oct. 2006), 35-47.
Optimizations Used • Majority of time spent during backprojection and determining the PI-intervals. • PI-intervals are constant for a particular helix. • PI-intervals are precomputed and saved to a file. • Only necessary to precompute PI-intervals for one horizontal slice. • PI-intervals for different horizontal slices can be determined by rotation. • Easy ~25% speedup
Optimizations Used • Next focused on backprojection inner loop. • Removed trival lookup tables to save cache space. • ~10% speedup. • Used sin, cos lookup tables • ~15% speedup. • Moved if statements for smoothing the ends of the PI-interval outside the loop. • Duplicated inner loop code. • ~10% speedup. • Removed if statements required for bounds testing the backprojected coordinates. • Needed to add extra row and column slack to projection data. • ~3% speedup.
Future work • Explore memory layout to reduce cache misses and page faults. • Implement the same algorithms on Cell processor for competitive analysis.