1 / 46

ENGG6530 Reconfigurable Computing Systems Instructor: Dr. Shawki Areibi

ENGG6530 Reconfigurable Computing Systems Instructor: Dr. Shawki Areibi. “Flexible Parallel Hardware Architecture for AdaBoost -Based Real-Time Object Detection” Presenter: Ziad Abuowaimer Date: March 17, 2014. Outline. AdaBoost real-time object detection algorithm: Haar -like features.

bevis
Download Presentation

ENGG6530 Reconfigurable Computing Systems Instructor: Dr. Shawki Areibi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ENGG6530Reconfigurable Computing SystemsInstructor: Dr. ShawkiAreibi “Flexible Parallel Hardware Architecture for AdaBoost-Based Real-Time Object Detection” Presenter: ZiadAbuowaimer Date: March 17, 2014

  2. Outline • AdaBoost real-time object detection algorithm: • Haar-like features. • Integral Image. • Complete flow of the algorithm. • Hardware Architecture of AdaBoost Algorithm: • Image Pyramid Generation. • Integral Image Computation. • Systolic Array Computation. • Implementation and Evaluation. • Summary.

  3. What is Object Detection? • Real-time Object detection is critical in several domains. For example, face detection is followed by face recognition. • Determine the location ( x, y) and scale of the object.

  4. Haar-like Features • Four basic types: • They are easy to calculate. • The white areas are subtracted from the black ones. • A special representation of the sample called the integral image makes feature extraction faster.

  5. Integral Image

  6. 2 features, reject 50% non-faces, detect 100% faces 10 features, reject 80% non-faces, detect 100% faces 25 features 50 features by algorithm Structure of the Detector Cascade Window 1 2 3 4 5 6 7 8 32 T T T T T T T T T Face F F F F F F F F F Reject Sub-Window

  7. Flow of AdaBoost Object Detector

  8. Hardware Architecture issues • Image scaling. • Integral image computation. • Feature and stage computation. • Identification of regions that contain the objects of interest.

  9. Image Pyramid Generation • The IPG unit receives the input video frame and generates the search windows to be processed by the systolic array. • The unit receives pixels row-wise, and generates search windows, which are then buffered and fed row-wisein parallel in the systolic array. • The size of the generated search windows is determined by the size of the systolic array.

  10. Image Pyramid Generation • The IPG and the systolic array operate in a pipelinedfashion, where the systolic computation happens as soon as a single search window is generated. • However, the IPG continues to generate search window pixel data while the systolic array is computing, preparing the next search window(s) that will be used. • The IPG unit also downscales the original image, ensuring that objects bigger than the search window size are downscaled, and eventually can fit into a search window as well.

  11. Systolic Array • The systolic array performs the bulk of the computation: • it computes the integral image. • collects and computes the rectangle points. • computes and evaluates the feature and stage sums. • determines whether a region passes a stage so that it can be considered for further search. • The array consists of two types of processing elements (PEs): • the collection and computation units (CCUs). • the evaluation units (EUs).

  12. Systolic Array

  13. Integral Image Computation • The computation consists of horizontal and vertical shiftsand additions. • Incoming pixels are shifted inside the array on each row. • Depending on the current pixel column, each of the computation units performs one of three operations: • it either adds the incoming pixel value into the stored sum. • or propagates the incoming value to the next-in-row processing element while, either shifting and adding in the vertical dimension (downwards) the accumulated sum or simply doing nothing in the vertical dimension.

  14. Squared Integral Image Computation • The squared integral image, the same procedure is followed: • The incoming pixel passes through the multiplier in the EU, which computes the square of the pixel value, • and then that value alternates with the original pixel value as inputs to the array.

  15. Integral Image Computation

  16. Integral Image Computation Step#1

  17. Integral Image Computation Step#1 2 6 1 10 5 14 9

  18. Integral Image Computation Step#1 Step#2 2 6 1 10 5 14 9

  19. Integral Image Computation Step#2 2 6 1 10 5 14 9 3 2 7 6 11 10 1 15 14 5

  20. Integral Image Computation Step#2 2 6 1 10 5 14 9 Step#3 3 2 7 6 11 10 1 15 14 5

  21. Integral Image Computation For image of n rows by m columns: - Entire computations take 2 * [m+(m-1)+(n-1)] cycles.

  22. Systolic Array

  23. Rectangle Computation The CCUs at the corner of each feature convoluted with the search window are responsible for collecting rectangle data for that feature. The 4 CCUs at the corner of each rectangle (Pi) hold the integral image values required for the computation of the rectangle sum. P1 P2 P3 P4

  24. Rectangle Computation P1 starts sending its integral image value to the top-left-most CCU. P1 P2 P3 P4

  25. Rectangle Computation P2 sends its value next. P1 P2 P3 P4

  26. Rectangle Computation P3 sends its value next. P1 P2 P3 P4

  27. Rectangle Computation P4 sends its value last. The top-left-most CCU receives the integral image values and computes the rectangle sum. P1 P2 P3 P4

  28. Next Rectangle Computation P1 P2 P3 P4

  29. Feature & Stage Sum Computation When all rectangles per feature are computed, it sends the sums to its corresponding EU.

  30. Send It Back to CCU The EU performs the feature and stage computation, compares the result with the threshold value and sends the accumulated stage sum back to the CCUs through a wrapped link.

  31. Parallel Computations Over The Window Each feature rectangle can be computed in parallel; this is possible because data flows in the array in systolic manner and always towards the same direction. P1 P2 P1 P2 P3 P4 P3 P4

  32. ParallelRectangle Computation P1 P2 P1 P2 P3 P4 P3 P4

  33. ParallelRectangle Computation P1 P2 P1 P2 P3 P4 P3 P4

  34. ParallelRectangle Computation P1 P2 P1 P2 P3 P4 P3 P4

  35. ParallelRectangle Computation P1 P2 P1 P2 P3 P4 P3 P4

  36. Feature & Stage Sum Computation

  37. Feature & Stage Sum Computation

  38. Implementations & Evaluations TABLE II RELATED WORK IMPLEMENTATION ON FPGAs RESULTS COMPARISON (a) Using three classification modules. (b) Implementation of a cycle accurate simulator. (c ) Using only 52 features and 1 stage TABLE IV DETECTION APPLICATIONS TRAINING DATA

  39. Results

  40. Evaluation • This paper is well written and it has a detailed and sufficient information to re-implement the hardware architecture. • The details of the training set and threshold values are not mentioned in the paper, but these information can be found in the original algorithm paper. • The only concern is how the window is generated from the original image is not explained well.

  41. Summary • Object detection is an important step in multiple applications related to computer vision and image processing, and real-time detection is critical in several domains. • In this paper, a flexible parallel architecture for implementation of the AdaBoost object detection algorithm is proposed. • The architecture combines an image pyramid generation process, along with highly parallel systolic computation, to offer a flexible design that is suitable for several types of applications and budgets.

  42. References • Kyrkou, C.; Theocharides, T., "A Flexible Parallel Hardware Architecture for AdaBoost-Based Real-Time Object Detection," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on , vol.19, no.6, pp.1034,1047, June 2011. • P. Viola and M. Jones, “Real-time object detection,” Int. J. Comput. Vision, vol. 57, no. 2, pp. 137–154, May 2004.

  43. Questions

  44. Extra Slides Variance • Variance Computation: • Squared Integral Image is needed

  45. Extra Slides FPGA Impl.

  46. Extra Slides ASIC Impl.

More Related