1 / 26

Interleaved Pixel Lookup for Embedded Computer Vision

Interleaved Pixel Lookup for Embedded Computer Vision. Kota Yamaguchi, Yoshihiro Watanabe, Takashi Komuro , Masatoshi Ishikawa. Outline. Introduction Problems to apply interleaving Techniques Example: Lucas- Kanade Conclusion. Purpose.

cara-jacobs
Download Presentation

Interleaved Pixel Lookup for Embedded Computer Vision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interleaved Pixel Lookup for Embedded Computer Vision Kota Yamaguchi, Yoshihiro Watanabe, Takashi Komuro, Masatoshi Ishikawa

  2. Outline • Introduction • Problems to apply interleaving • Techniques • Example: Lucas-Kanade • Conclusion

  3. Purpose • To find a technique to efficiently implement a parallel memory for pixel lookup operations Interleaving Image Processing Computer Vision Tasks … Model objects, Feature space (e.g. Pose, Shape) Camera captures … Images

  4. Motivation • Strong influence to downstream performance • Massive memory operations • Always a headache for embedded designers Image Processing Computer Vision Tasks … Model objects, Feature space (e.g. Pose, Shape) Camera captures … Images

  5. Motivation • Interleaving in graphics hardware • Texram [Schilling, 96] • Texture memory in Recent GPUs • Is it also beneficial to an embedded computer vision hardware? • Yes, if appropriately implemented

  6. Pixel lookup operations • Geometry-to-pixel conversion Geometry stream Pixel stream … … xk+2 xk+1 xk I (xk+2) I (xk+1) I (xk ) … … … Input images as a lookup table

  7. Straightforward implementation • Random access memory • Expensive and slow Geometry stream Pixel stream RAM … … xk+2 xk+1 xk I (xk+2) I (xk+1) I (xk ) … … Input images

  8. Interleaved implementation • Higher throughput with same capacity • But, suffers from partitioning and alignment issues Geometry stream Pixel stream Interleaved Memory … … Packed words Input images

  9. Partitioning issue • Parallel word does not match to operations • e.g. packing neighboring 1x4 pixels into a word, but required 4x1 pixels at each operation Pixel read read read align read

  10. Misalignment issue • Unaligned access requires multiple reads and sub-word alignment Word boundary read align read

  11. Techniques • 2D partitioning • Indirect addressing • Data switching

  12. 2D partitioning • See an entire image as tiled spatial patterns • Packed word = spatial pattern required • Avoids partitioning issue Memory banks Spatial Pattern Packedword

  13. Spatial pattern • Certain pattern present in a lookup sequence E.g. - 2x2 block for interpolation - 3x3 block for convolution (i’, j’) (i’+1, j’) (i, j) (i+1, j) … (i’+1, j’) (i’+1, j’+1) (i ,j+1) (i+1, j+1) … … Input images

  14. 2D partitioning and misalignment • Tiled patterns guarantee data elements in a word are always distributed even if an access overlaps address boundaries Bank 1 Bank 2 Bank 3 Bank 4 4 3 2 1 4 3 2 1

  15. Indirect addressing • Generating patterned addresses for each bank removes multiple reads for misaligned access Bank 1 Bank 2 Bank 3 Bank 4 4 3 2 1 4 3 2 1 Address generator

  16. Data switching • Switch removes throughput decrease caused by sub-word alignment Bank 1 Bank 2 Bank 3 Bank 4 4 3 2 1 4 3 2 1 Address generator

  17. Techniques overview Indirect addressing Data switching Geometry stream Address generator Pixel stream … Memory banks … 2D partitioning Input images

  18. Example: Lucas-Kanade • Image registration algorithm • Non-linear least squares to solve for parameters of affine transformation between input and template [Baker & Matthews, 04] Input image Gauss-Newton method Affine parameters Template image

  19. LK data flow • Bottleneck: for-each-x for-each-iteration stack • Includes pixel lookup For each iteration For each

  20. Pixel lookup in LK • Affine warped coordinates to pixels conversion • Lookup neighboring 4x4 pixels for each output Raw pixels Warped gradient pixels Warped coordinates Pixel lookup table … … … … … Interpolation Warped input pixels Input images

  21. Straightforward implementation Filter Kernels Raw pixels RAM Multiply-Adds … … … … … Input images

  22. Interleaved implementation Filter Kernels Raw pixels Interleaved memory Multiply-Adds Address generator … Memory banks … … … … Input images 4x4 block partitioning

  23. Comparison of memory configurations Easier to implement peripherals than increasing memory capacity

  24. FPGA implementation of LK pipeline • Just interleaving contributes to 16x larger throughput for the dedicated pipeline Dedicated hardware pipeline FPU Affine Warp Calculator Filter Kernel Generator Gradient / Interpolation Filter Jacobian Filter Hessian Matrix Calculator FP ALU Input Pixel Table SDPU Calculator Error Calculator FP Register Template Pixel Table For each x For each iteration

  25. HDL synthesis • 16x larger throughput, but still same capacity requirement and feasible hardware costs • Estimated performance: 200 fps for registration of 5 pieces of 64x64 8-bit image patches at 100 MHz • Assumption: all registration converge within 10 iterations

  26. Summary • Interleaved pixel lookup • Sub-word parallel memory operations utilizing spatial pattern in lookup sequences • Techniques • 2D partitioning • Indirect addressing • Data switching • Example: Lucas-Kanade • 16x larger throughput with same memory capacity and feasible hardware cost

More Related