1 / 23

Mapping Irregular Algorithms in a Custom Computing Image Processing Framework

Mapping Irregular Algorithms in a Custom Computing Image Processing Framework. Frédéric Planque Ivan C. Kraljic Yvon Savaria MiroTech Microsystems Inc. 395 Ste-Croix suite 202 St-Laurent, Qc H4N 2L3 Canada research@mirotech.com www.mirotech.com. Contents. RTIP Framework Basic paradigms

ziazan
Download Presentation

Mapping Irregular Algorithms in a Custom Computing Image Processing Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mapping Irregular Algorithms in a Custom Computing Image Processing Framework Frédéric Planque Ivan C. Kraljic Yvon Savaria MiroTech Microsystems Inc. 395 Ste-Croix suite 202 St-Laurent, Qc H4N 2L3 Canada research@mirotech.com www.mirotech.com

  2. Contents • RTIP Framework • Basic paradigms • Application development • Operator library • Connected Components Labeling • Algorithm • Implementation • Image Warping • Affine inverse transformation • Application: rotation • Conclusion

  3. Real-time image processing (RTIP) • Processing power • Billions of instructions per second • Bandwidth • External: 10-100 MBytes/s.; Internal: 100-1000 MBytes/s. • Embedded memories • Frame, line and pixels delays • Informal specification • Experimentation and heuristics • Adaptive behavior • Changing environment and scenario

  4. A real-time image processing framework: Foundations • Execution model: Hardwired dataflow • An operation is “fired” as soon as all its operands are available (J.B. Dennis, Data flow supercomputers, Computer (13), 1980). • Hardwired dataflow: hardware operators statically connected by physical links • On-the-fly processing of incoming data (raster-scan data flows) • Programming model: Multiple Instructions Single Data (MISD) • All the operations are executed for a single pixel in one clock cycle (pipelined) • Functional parallelism

  5. Hardwired dataflow paradigm Operation Data dependency Physical operator Median Edge Physical connection Sub Dataflow graph Median Edge Sub (J. Sérot, G. Quénot, B. Zavidovique, Functional Programming on a dataflow architecture, Machine Vision and Applications, 7(1), 1993.) Operator graph

  6. MISD paradigm • If all operators have a throughput of one pixel/clock cycle, any cascade will have the same throughput • On-the-fly performance guaranteed • Execution time is constant (as long as there is enough hardware) • State-of-the-art million-gate reconfigurable device • Latency is affected

  7. Application development • Library of configurable image processing operators • Convolvers, filters, edge detectors... • Better suited than register-transfer level for real applications • An application is decomposed into a cascade of library operators • One operation requires one operator • Physical operators are connected according to the static schedule of operations in the dataflow graph • Physical cascading of operators can be automated • Leverages reconfigurable computing • Application on demand • State-of-the art 1M+ gates FPGAs

  8. Framework features • Uniform • Encapsulated operators with uniform interfaces for data and control • Modular • Local data-driven control to each operator • Stand-alone library operators • Adaptive • Video format, image size and resolution • Open • Support for user-specific proprietary operators • No support for resource sharing • One-to-one mapping between operation and operator

  9. Linear 3x3/5x5/7x7/9x9/11x11 asymmetric convolutions 3x3/5x5 Kirsch, Sobel, Laplacian, Prewitt filters 3x3 sharpening, smoothing, mean, variance Non-linear 3x3/5x5 Median, minimum, maximum, gradient Noise filtering Morphological 3x3 erosion, dilation, closing, opening Binary 3x3 erosion, dilation, closing, opening, pruning, skeletonization Other Maximum tracking, motion detection, histogram, distance map RTIP operator library

  10. Connected components labeling • Grouping operation (map pixels to blobs) • All connected “foreground” pixels are given the same label (= one blob) • Algorithms • Iterative • Two-pass • Applications • Blob analysis, machine vision, target tracking...

  11. Two-pass algorithm Image of temporary labels Input binary image Equivalence table {3 <=> 2, 4 <=> 3} Output labeled image

  12. CCL: first-pass • Left-to-right, top-to-bottom, label propagation If current pixel Px,y is in foreground: • If the current pixel Px,yhas no top Px,y-1 and left Px-1,y neighbors, create a new label and assign it to that pixel. • If the current pixel has only one labeled neighbor, give it the same label. • If the current pixel has two neighbors with the same label, give it that label. • If the current pixel has two neighbors with different labels, assign the minimum of the two labels to the current pixel and register in an equivalence table that the labels are equivalent. Px,y-1 Px-1,y Px,y 4-connectivity L-type mask

  13. CCL: Equivalence resolution & 2nd pass • Determine equivalence classes from all pairs of equivalent labels • Assign a unique label to each equivalence class • Rescan image of temporary labels, and assign the final unique label to each temporary label

  14. Equivalence resolution: Implementation • Content-addressable memory (CAM) • All labels equivalent to one label can be found in one cycle • + Fast equivalence resolution O(n) • - High memory consumption (Xilinx Virtex: one 4k block RAM implements a 16x8 CAM; Virtex 1000 has one 512x8 CAM) 0 1 CAM MxN 0 All addresses that contain “Label” are found in one cycle N M 1 Label Data Addr 1 0 1 0 • Depth-first search with RAM • + Low memory consumption • - Slow equivalence resolution O(n2)

  15. CCL architecture RTIP framework compatible I/O Equivalence resolution • First pass • Generates image with temporary labels • Stores equivalent labels in the equivalence table • Frame delay • Delays the image of temporary labels • Equivalence resolution • Depth-first search of equivalent labels • Second pass • Remaps temporary labels into final unique labels Custom I/O Frame delay First pass Second pass

  16. Parallelism for on-the-fly processing Even/odd image Even/odd image Image stream Labeled images stream Even labeler Images 2, 4, 6... Odd labeler Switch Mux 4 3 2 1 4 3 2 1 Images 1, 3, 5... Image 1 Image 2 Image 3 Image 4 Equiv. resolution image 3 Equiv. resolution image 1 2nd pass image 1 2nd pass image 3 1st pass image 1 1st pass image 3 Odd labeler Equiv. resolution image 2 Equiv. resolution image 4 2nd pass image 2 1st pass image 2 1st pass image 4 Even labeler

  17. Limitations • Worst-case for equivalence resolution: Nequx (Nequ-1) where Nequ is the max. number of equivalences • Worst-case for 1st/2nd pass: X x Y (image size) • On-the-fly processing fails if Nequx (Nequ-1) > X x Y On-the-fly processing: correct On-the-fly processing: failed Nequx (Nequ-1) Nequx (Nequ-1) Equivalence resolution Image 1 Equivalence resolution Image 1 1st pass Image 1 1st pass Image 1 Image 2 Image 3 Image 2 Image 3 X x Y X x Y

  18. CCL: status and future work • Status: • 512 x 1024 image size maximum • 254 temporary and final labels maximum (254 blobs) • Label 255 used for removing blobs touching image borders (optional) • 512 equivalences between temporary labels maximum after first pass • On-the-fly processing for images 512 x 511 and up • Add-on compatible cores: blob area, blob centroid, blob bounding box… • Future work: • Larger images • More labels (temporary, final and equivalences) • Optimize equivalence resolution for faster processing

  19. Image warping • Geometric transformation • Input image pixel coordinates [u, v]; warped image pixel coord. [x, y]: [x, y] = [f(u, v), g(u, v)] (forward mapping) or [u, v] = [h(x, y), k(x, y)] (inverse mapping) • Affine transformation:

  20. Inverse mapping: Architecture • Nearest-neighbor interpolation • Transformation matrix stored in dynamically reconfigurable LUT • Duplicate operator for on-the-fly processing • Same architecture as connected component labeler • Higher-order interpolation (bilinear, bicubic) • Inverse mapping may have reduced performance • Better suited to a forward mapping architecture (in development) RTIP framework compatible I/O [u,v] coordinates Affine transformation Custom I/O Frame buffer Warped pixel

  21. Application: Rotation • Affine transformation Inverse Mapping

  22. Application: Blob analysis • Transparent cascading thanks to RTIP framework • 50 Mpixels/s throughput RTIP framework compatible I/O Angle Threshold Area Blob features Centroid Conn. comp. labeling Rotation Binarization Camera angle correction Bound. box

  23. Conclusion • Hardwired dataflow now feasible on single-chip reconfigurable devices • Operator library • Adaptive framework (frame rate, image size, pixel resolution) • 70+ operators and growing • Irregular algorithms (labeling, warping) feasible • Fast application development time thanks to modularity • Vision system on a chip • 20 to 90 basic RTIP operators on today’s state-of-the-art FPGA • Hundreds of instructions per pixel (3x3 convolution: 9 instructions/pixel) • 50 to 100 Mpixels per second throughput • Billions of instructions per second

More Related