1 / 23

Parallel Beam Back Projection: Implementation

Parallel Beam Back Projection: Implementation. Srdjan Coric Miriam Leeser Eric Miller. Outline. Annapolis Wildstar “Simple Architecture” algorithm datapath Performance Results Parallelism extraction “Advanced Architecture 4x” datapath Performance Results Implementation issues

posy
Download Presentation

Parallel Beam Back Projection: Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Beam Back Projection:Implementation Srdjan Coric Miriam Leeser Eric Miller

  2. Outline • Annapolis Wildstar • “Simple Architecture” • algorithm • datapath • Performance • Results • Parallelism extraction • “Advanced Architecture 4x” • datapath • Performance • Results • Implementation issues • Future directions

  3. Sinogram data address generation Sinogram data retrieval Sinogram data prefetch Linear interpolation Data accumulation Data read Data write Data Flow

  4. LUT1 starting position Critical error-accumulation path LUT1 quantization error Bit reduction error LUT2 quantization error LUT3 quantization error 5 10 . LUT1: 15 1 . LUT2: 15 . 2 LUT3: Interpolation factor errorCorner starting position

  5. “Simple Architecture” Datapath

  6. Performance Results: Software vs. FPGA Hardware • Software - Floating point - 450 MHz Pentium : ~ 240 s • Software - Floating point - 1 GHz Dual Pentium : ~ 94 s • Software - Fixed point - 450 MHz Pentium : ~ 50 s • Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s • Hardware - 50 MHz : ~ 5.4 s Parameters: 1024 projections 1024 samples per projection 512*512 pixels image 9-bit sinogram data 3-bit interpolation factor

  7. Original image Hardware output image Zoom: ~200% Grayscale range < Pixel value range (heart features in focus)

  8. Original image Hardware output image Zoom: ~200% Grayscale range < Pixel value range (lung features in focus)

  9. Original image - Hardware output image

  10. Memory bandwidth requirements at 50 MHz (for data accumulation) Case 1: 0.4 GB/s Case 2: 1.6 GB/s Case 3: 0.4 GB/s Memory bandwidth limit 1.2 GB/s Parallelism Issues Case 1: No parallelism extracted Case 2: Pixel level parallelism extracted Case 3: Projection level parallelism extracted Projections Image columns V1 Image rows V3 V2 T~k1*V1 T~k1*V2 T~k2*V3 k1 <k2, V2 =V3 =V1 /4, T=Execution time

  11. Simple Architecture Advanced Architecture - Data Path projection parallelism extracted

  12. Performance Results: Software vs. FPGA Hardware • Software - Floating point - 450 MHz Pentium : ~ 240 s • Software - Floating point - 1 GHz Dual Pentium : ~ 94 s • Software - Fixed point - 450 MHz Pentium : ~ 50 s • Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s • Hardware - 50 MHz : ~ 5.4 s • Hardware (Advanced Architecture) - 50 MHz : ~ 1.3 s Parameters: 1024 projections 1024 samples per projection 512*512 pixels image 9-bit sinogram data 3-bit interpolation factor

  13. Implementation Issues - fanout - prj_num(3) fanout = 1565 ! routing delay = 7.913 ns (~39.99%)

  14. Implementation Issues - fanout - odd_2_A_4[4] fanout = 144 !

  15. Memory Bridges Stuff 3 architectures implemented: • “Simple Architecture” = non-parallel (on slide 6) • “Advanced Architecture” = 4-way parallel (slide 12) • “Bridge Free Advanced Arch” = as B but contains no memory bridges (all design buffers in BlockRAMs) from PCI bus to memory banks required for Host-Memory communication. Bridges are separate design that is downloaded before (after) design C is downloaded so that input data can be stored to (output data read from) memories on the WildStar board. Virtex1000 resource utilization: • 11% logic, 90% BlockRAMs (with bridges) • 39% logic, 100% BlockRAMs • 21% logic, 100% BlockRAMs

  16. Floorplan of the “Bridge Free Advanced Architecture” (design C on the previous slide)

  17. Future Directions • Graduate

More Related