460 likes | 650 Views
ENGG6530 Reconfigurable Computing Systems Instructor: Dr. Shawki Areibi. “Flexible Parallel Hardware Architecture for AdaBoost -Based Real-Time Object Detection†Presenter: Ziad Abuowaimer Date: March 17, 2014. Outline. AdaBoost real-time object detection algorithm: Haar -like features.
E N D
ENGG6530Reconfigurable Computing SystemsInstructor: Dr. ShawkiAreibi “Flexible Parallel Hardware Architecture for AdaBoost-Based Real-Time Object Detection” Presenter: ZiadAbuowaimer Date: March 17, 2014
Outline • AdaBoost real-time object detection algorithm: • Haar-like features. • Integral Image. • Complete flow of the algorithm. • Hardware Architecture of AdaBoost Algorithm: • Image Pyramid Generation. • Integral Image Computation. • Systolic Array Computation. • Implementation and Evaluation. • Summary.
What is Object Detection? • Real-time Object detection is critical in several domains. For example, face detection is followed by face recognition. • Determine the location ( x, y) and scale of the object.
Haar-like Features • Four basic types: • They are easy to calculate. • The white areas are subtracted from the black ones. • A special representation of the sample called the integral image makes feature extraction faster.
2 features, reject 50% non-faces, detect 100% faces 10 features, reject 80% non-faces, detect 100% faces 25 features 50 features by algorithm Structure of the Detector Cascade Window 1 2 3 4 5 6 7 8 32 T T T T T T T T T Face F F F F F F F F F Reject Sub-Window
Hardware Architecture issues • Image scaling. • Integral image computation. • Feature and stage computation. • Identification of regions that contain the objects of interest.
Image Pyramid Generation • The IPG unit receives the input video frame and generates the search windows to be processed by the systolic array. • The unit receives pixels row-wise, and generates search windows, which are then buffered and fed row-wisein parallel in the systolic array. • The size of the generated search windows is determined by the size of the systolic array.
Image Pyramid Generation • The IPG and the systolic array operate in a pipelinedfashion, where the systolic computation happens as soon as a single search window is generated. • However, the IPG continues to generate search window pixel data while the systolic array is computing, preparing the next search window(s) that will be used. • The IPG unit also downscales the original image, ensuring that objects bigger than the search window size are downscaled, and eventually can fit into a search window as well.
Systolic Array • The systolic array performs the bulk of the computation: • it computes the integral image. • collects and computes the rectangle points. • computes and evaluates the feature and stage sums. • determines whether a region passes a stage so that it can be considered for further search. • The array consists of two types of processing elements (PEs): • the collection and computation units (CCUs). • the evaluation units (EUs).
Integral Image Computation • The computation consists of horizontal and vertical shiftsand additions. • Incoming pixels are shifted inside the array on each row. • Depending on the current pixel column, each of the computation units performs one of three operations: • it either adds the incoming pixel value into the stored sum. • or propagates the incoming value to the next-in-row processing element while, either shifting and adding in the vertical dimension (downwards) the accumulated sum or simply doing nothing in the vertical dimension.
Squared Integral Image Computation • The squared integral image, the same procedure is followed: • The incoming pixel passes through the multiplier in the EU, which computes the square of the pixel value, • and then that value alternates with the original pixel value as inputs to the array.
Integral Image Computation Step#1
Integral Image Computation Step#1 2 6 1 10 5 14 9
Integral Image Computation Step#1 Step#2 2 6 1 10 5 14 9
Integral Image Computation Step#2 2 6 1 10 5 14 9 3 2 7 6 11 10 1 15 14 5
Integral Image Computation Step#2 2 6 1 10 5 14 9 Step#3 3 2 7 6 11 10 1 15 14 5
Integral Image Computation For image of n rows by m columns: - Entire computations take 2 * [m+(m-1)+(n-1)] cycles.
Rectangle Computation The CCUs at the corner of each feature convoluted with the search window are responsible for collecting rectangle data for that feature. The 4 CCUs at the corner of each rectangle (Pi) hold the integral image values required for the computation of the rectangle sum. P1 P2 P3 P4
Rectangle Computation P1 starts sending its integral image value to the top-left-most CCU. P1 P2 P3 P4
Rectangle Computation P2 sends its value next. P1 P2 P3 P4
Rectangle Computation P3 sends its value next. P1 P2 P3 P4
Rectangle Computation P4 sends its value last. The top-left-most CCU receives the integral image values and computes the rectangle sum. P1 P2 P3 P4
Next Rectangle Computation P1 P2 P3 P4
Feature & Stage Sum Computation When all rectangles per feature are computed, it sends the sums to its corresponding EU.
Send It Back to CCU The EU performs the feature and stage computation, compares the result with the threshold value and sends the accumulated stage sum back to the CCUs through a wrapped link.
Parallel Computations Over The Window Each feature rectangle can be computed in parallel; this is possible because data flows in the array in systolic manner and always towards the same direction. P1 P2 P1 P2 P3 P4 P3 P4
ParallelRectangle Computation P1 P2 P1 P2 P3 P4 P3 P4
ParallelRectangle Computation P1 P2 P1 P2 P3 P4 P3 P4
ParallelRectangle Computation P1 P2 P1 P2 P3 P4 P3 P4
ParallelRectangle Computation P1 P2 P1 P2 P3 P4 P3 P4
Implementations & Evaluations TABLE II RELATED WORK IMPLEMENTATION ON FPGAs RESULTS COMPARISON (a) Using three classification modules. (b) Implementation of a cycle accurate simulator. (c ) Using only 52 features and 1 stage TABLE IV DETECTION APPLICATIONS TRAINING DATA
Evaluation • This paper is well written and it has a detailed and sufficient information to re-implement the hardware architecture. • The details of the training set and threshold values are not mentioned in the paper, but these information can be found in the original algorithm paper. • The only concern is how the window is generated from the original image is not explained well.
Summary • Object detection is an important step in multiple applications related to computer vision and image processing, and real-time detection is critical in several domains. • In this paper, a flexible parallel architecture for implementation of the AdaBoost object detection algorithm is proposed. • The architecture combines an image pyramid generation process, along with highly parallel systolic computation, to offer a flexible design that is suitable for several types of applications and budgets.
References • Kyrkou, C.; Theocharides, T., "A Flexible Parallel Hardware Architecture for AdaBoost-Based Real-Time Object Detection," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , vol.19, no.6, pp.1034,1047, June 2011. • P. Viola and M. Jones, “Real-time object detection,” Int. J. Comput. Vision, vol. 57, no. 2, pp. 137–154, May 2004.
Extra Slides Variance • Variance Computation: • Squared Integral Image is needed