20 likes | 287 Views
Backprojection and Synthetic Aperture Radar Processing on a HHPC. Albert Conti, Ben Cordes, Prof. Miriam Leeser, Prof. Eric Miller aconti@ece.neu.edu bcordes@ece.neu.edu mel@ece.neu.edu elmiller@ece.neu.edu. Bio-Med. Enviro-Civil. L3. S2. S3. S4. S1. S5. Validating TestBEDs. L2.
E N D
Backprojection and Synthetic Aperture Radar Processing on a HHPC Albert Conti, Ben Cordes, Prof. Miriam Leeser, Prof. Eric Miller aconti@ece.neu.edubcordes@ece.neu.edumel@ece.neu.eduelmiller@ece.neu.edu Bio-Med Enviro-Civil L3 S2 S3 S4 S1 S5 ValidatingTestBEDs L2 FundamentalScience R2 L1 R1 R3 This work was supported in part by CenSSIS, the Center for Subsurface Sensing and Imaging Systems, under the Engineering Research Centers Program of the National Science Foundation (Award Number EEC-9986821). Funded by DOD High Performance Computing Modernization Program. Grant #PET SIP-K4-003 www.ece.neu.edu/groups/rpl What is SAR? What is Backprojection? Abstract Stripmap Mode SAR • SAR: Synthetic Aperture Radar • Aperture (width of radar dish) directly affects the resolution of the image • Many radar pulses taken and processed • Aperture is synthetically increased by accumulating the results • ‘Stripmap’ and ‘Spotlight’ modes For More Detail: • Soumekh, M. “Synthetic Aperture Radar Signal Processing with MATLAB Algorithms”, ISBN 0-471-29706-2 Synthetic Aperture Radar (SAR) is a process by which high-resolution images can be formed by processing a series of radar reflections taken by a single transceiver. Backprojection is one method for post-processing these reflections; it is a highly parallel algorithm, which makes it suitable for translation into hardware. This poster explores the difficulties involved in achieving maximum speedup from a hardware implementation on a parallel computing system, including memory bandwidth, communication bottlenecks, and others. • SAR output: array of radar response ‘projections’ • Filter out physical effects of radar • Correlate pixels to time, index into projection data • Interpolate between indices to increase accuracy • Plane flies past target in straight line • Multiple radar pulses are taken at right angle to flight path • Each pulse covers some portion of the target area Previous Work: Results Coarse-grained Parallelism Fine-grained parallelism Exploiting Parallelism Previous Work • Process several projections on each system • Size and available space determines ratio of projections per board • Work for each set of projections can be pipelined and parallelized • Memory bandwidth determines number of parallel pipelines • Parallel operations can provide performance gains • Data dependencies reduce parallelism • Few dependencies exist in SAR/BP • Medical Imaging • Spotlight mode with backprojection • Used Annapolis Firebird board • Precursor to WildStar II board • 65MHz clock, 16-way pipeline For More Detail: Haiqian Yu, “Memory Architecture for Data Intensive Image Processing Algorithms in Reconfigurable Hardware”, Master’s Thesis; Northeastern University, Boston MA Hybrid Implementation Performance HHPC Architecture • The hybrid implementation achieved 40X speedup over a software solution with a single node of the HHPC (no coarse-grained parallelism). • Preliminary results from the parallel version of the hybrid implementation show drastic speedup in the processing stage of the algorithm, yet a slowdown in reconstruction of the final image due to inter-process communication. • Currently, work is being done to analyze the optimal number of processing nodes to reconstruct images most efficiently with the HHPC. • 48-node Beowulf cluster • Dual 2.2GHz Xeons • Linux OS • Annapolis MicroSystems WildStar II FGPA boards • Champ LVDS systolic interconnect • Gigabit Ethernet cards • Myrinet MPI cards • In stage 1, data from the separate projections are fetched from storage and made ready to distribute amongst the processing nodes. • In stage 2, the projection data is broadcasted to all of the processor nodes via Myrinet. Processor nodes listen and accept data that contributes to their respective sections of the target area. • In stage 3, distinct regions of the target area reconstructed in parallel. • In stage 4, these smaller regions that were generated in stage 3 are merged to form the final target image. • In stage 5, the final image is stored on disc. 1. Input Data Loaded from Disc PC 2. Data Broadcasted to Processor Nodes Research Level 1 Thrust R3 PC FPGA PC FPGA PC FPGA PC FPGA FPGA Input BRAM Future Optimizations SWATHLUT TargetMemory 1 Methods of serial computing are slow and can not take advantage of the inherent parallelism of the algorithm for processing SAR 3. Parallel • Overlap processing and communication in an effort to make use of inherent communication latency • Overlap file I/O and communication to minimize end to end run time • Utilize processing nodes for intermediate merging • Stagger processing stages to avoid communication collisions Backprojection PCI data. This work is focused on developing a high-speed computation engine that will enable image reconstruction in a small fraction of the time possible with serial computing. Processing TargetMemory 2 4. Target Images Merged Staging BRAM PC 5. Aggregate Image Stored to Disc