Accelerating Image Processing Pipelines in a Hardware/Software Environment

Accelerating Image Processing Pipelines in a Hardware/Software Environment Heather Quinn, Dr. Miriam Leeser, Northeastern University Dr. Laurie Smith King College of the Holy Cross

Outline • Background • Image processing and hardware • The cost of codesign systems • Image processing pipelines • An example • The Pipeline Assignment Problem • Solving Pipeline Assignment

Goal of Project • Accelerate image processing tasks through efficient use of FPGAs • Combine already designed components at runtime to implement series of transformations (pipelines)

Image Processing Tasks • Good candidates for FPGAs • Algorithms are often explicitly parallel • Input and output data large but individual pixels are small • Data access regular

Hardware Systems • Using hardware incurs execution costs not present in software systems • hardware initialization • communicating image • reprogramming

Efficient Use of FPGAs • Software algorithm’s runtime for small images less than the hardware costs • Profiling the hardware and software runtimes for different image sizes determines the crossover point • Deciding at runtime to execute in software or hardware is simple for one algorithm processing one image

Image Processing Pipelines • Series of image processing algorithms applied to an image • Each algorithm has a software and hardware implementation • Finding the crossover point for a pipeline is complicated • Exponential number of implementations • Reprogramming costs • Need a strategy to find a fast pipeline implementation at runtime

Median Filter & Edge Detection Median Repgm Edge Det Get Data Display Start App HW Init Send Data 300 ms .00105 ms per pixel 70 ms .00105 ms per pixel An Example Need pipeline implementations that minimize reprogramming and communication costs

Possible Implementations • Blue boxes are hw/sw boundaries • Red boxes are fixing image edges • Green Boxes are reprogramming

Median Filter and Edge Detection Profiles

Median FilterEdge Detection Profiles

Problem Statement • Inputs: a profiled library of image processing components, a pipeline, and an image • Output: an assignment of each component to a hardware or software implementation

The Library of Components • Each component has two implementations: hardware and software • Each implementation has known runtimes for a set of images • Interpolation used for rest of images • Each hardware implementation has a known area size • Each component interface is image in/image out

Assumptions • Reprogramming and communication costs incurred at sw/hw boundaries • Might need to fix image edge in between components • Problems sizes of 20 or fewer stages • 500 ms to make a decision

Solving Pipeline Assignment • Exhaustively • ILP • Greedy • Local Search • Experiments

Related Problems • Codesign partitioning problem • ILP: Niemann and Marwedel • Simulated Annealing: Cosyma • Iterative: Ptolemy • Parallel computing scheduling • Local Search: UNM

Exhaustive • Find: Optimal solutions • How: Search entire problem space • Algorithm Runtime: O(2N), where N is the number of pipeline stages

ILP • Find: Optimal solutions • How: AMPL model running on CPLEX • Need: ILP formulation of the problem statement • Algorithm Runtime: Unknown

Greedy • Find: Sub-optimal solutions • How: Make optimal decisions for each pipeline stage based on hardware area usage and speedup values • Algorithm Runtime: O(N), where N is the number of pipeline stages

Local Search • Find: Sub-optimal solutions • How: Improve upon initial solutions (found through greedy or randomly) • Algorithm Runtime: Runs for user supplied amount of time

Experiments • Synthetic components arranged into pipelines of length 1 to 20 • Exhaustive algorithm run to completion • Used as a baseline for solution quality • Timed to find 500 ms boundary • ILP solver constrained to 500 ms • Ability to solve dependent on components • Local Search returns best solution found within time limit

Results • Optimal solutions in 500 ms • Exhaustive: up to 11 stages • ILP: all pipelines up to 13 stages, some pipelines up to 18 stages, and none larger • Sub-optimal solutions in 500 ms • Greedy and local search: all problem sizes • Strategy • Exhaustive and ILP up to 13 stages • Greedy or local search for more than 13 stages

Optimal Solutions in 500 ms

Conclusion • Defined pipeline assignment • Introduced 4 possible ways to solve the problem at runtime • Found that 3 algorithms to efficiently solve different problem sizes

Future Work • ADAPT: Algorithm that calls exhaustive, ILP and local search algorithms to solve pipeline assignment problem based on problem size • Decision Time: Study how the amount of time allotted affects ADAPT results • Virtex II Pro: Add support for using embedded Power PC cores

References [1] R. Niemann and P. Marwedel, Hardware/Software Partitioning using Integer Programming, Proceedings of the European Design and Test Conference, Paris, France, 1996, pp. 473-480. [2] J. Henkel, R. Ernst, U. Holtmann, and T. Benner, Adaptation of Partitioning and High-Level Synthesis in Hardware/Software Co-synthesis, Proceedings of International Conference on Computer-Aided Design, 1994. [3] A. Kalavade and E. A. Lee, A Gobal Criticality/Local Phase Driven Algorithm for the Constrainted Hardware/Software Partitioning Problem, Proceedings of CODES/CASHE 1994, Third International Workshop on Hardware/Software Codesign, Grenoble, France, Sept. 22-24, 1994, pp 42-48. [4] M.-Y. Wu,W. Shu and J. Gu, Efficient Local Search for DAG Scheduling, IEEE Transactions on Parallel and Distributed Systems, vol 12, num 6

Accelerating Image Processing Pipelines in a Hardware/Software Environment

Accelerating Image Processing Pipelines in a Hardware/Software Environment

Presentation Transcript

How Computers Work

Smoothing Techniques in Image Processing

BURIED FLEXIBLE PIPELINES

Digital Image Processing

Digital Image Processing

Digital Image Processing

Image Processing

MetaMorph Workshop

The Challenges of Using An Embedded MPI for Hardware-Based Processing Nodes

Digital Image Processing

Landmarks of the UK Round

Image Processing and Analysis

Tutorial on Neural Network Models for Speech and Image Processing

KEK activities on CLIC X-band Accelerating Structures

Digital Image Processing Chapter 8: Image Compression 11 August 2006

Image analysis and computer vision

Introduction to Computer Vision Image Texture Analysis

CS589-04 Digital Image Processing Lecture 2. Intensity Transformation and Spatial Filtering

Have We Set the Bar Too High?

Fourier Transform in Image Processing