Research on Graph-Cut for Stereo Vision

Research on Graph-Cut for Stereo Vision Presenter: Nelson Chang Institute of Electronics, National Chiao Tung University

Outline • Research Overview • Brief Review of Stereo Vision • Hierarchical Exhaustive Search • Partitioned Graph-Cut for Stereo Vision • Hierarchical Parallel Graph-Cut

Our Research HRP-2 Head • A fast vision system for robotics • Stereo vision • Local block-based + diffusion (M) • Graph-cut (PhD) • Belief propagation (PhD) • Segmentation • Watershed (M) • Meanshift • Approaches • Embedded solutions • DSP (U) • ASIC • PC-based solutions • Dual webcam stereo (U) HRP-2 Tri-Camera Head

My Research • A fast graph-cut VLSI engine for stereo vision • ASIC approach • Goal: 256x256 pixels, 30 depth label, 30 fps • Stereo vision system prototypes • PC-based • DSP-based • FPGA/ASIC-based

Review on Stereo Vision Presenter: Nelson Chang Institute of Electronics, National Chiao Tung University

d Concept of Stereo Vision • Computational Stereo – to determine the 3-D structure of a scene from 2 or more images taken from distinct view points. Triangulation of non-verged geometry d : disparity Z : depth T : baseline f : focal length M. Z. Brown et al., “Advances in Computational Stereo,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 8, August 2003.

Disparity Image • Disparity Map/Image • The disparities of all the pixels in the image • Example: Left Cam Right Cam 110 pixels Disparity map of the 4x4 block 0 0 0 0 Left Disparity Map Right Disparity Map 0 0 110 0 Farthest 0 100 138 0 d= 0 80 123 156 176 d= 255 Nearest

How to find the disparity of a pixel? (1/2) • Simple Local Method • Block Matching • SADSum of Absolute Difference • ∑|IL-IR| • Find the candidate disparity with minimal SAD • Assumption • Disparities within a block should be the same • Limitation • Works bad in texture-less region • Works bad in repeating pattern 0 0 0 0 0 100 d=k-1 SAD=400 0 200 300 0 0 0 d=k SAD=0 0 0 0 0 100 0 0 100 0 200 300 0 d=k+1 SAD=600 200 300 0 Left 0 0 0 100 0 0 300 0 0 Right

How to find the disparity of a pixel? (2/2) • Complex Global Method • Graph-cut, Belief Propagation • Disparity Estimation  Optimal Labeling Problem • Assign the label (disparity) of each pixel such that a given global energy is minimal • Energy is a function of the label set (disparity map/image) • The energy considers the • Intensity similarity of the corresponding pixel • Example: Absolute Difference (AD), D=|IL-IR| • Disparity smoothness of neighboring pixels • Example: Potts Model If (dL≠dR), V=K else, V=0 d=0 V=2K d=16 V=3K d=32 V=3K d=2 V=4K 0 0 ? 16 32

Swap and Expansion Moves More chances of finding more local minimum E • Weak move • Modifies 1 label at a time • Standard move • Strong • Modifies multiple labels at a time • Proposed swap and expansion move Init. Strong Weak α-βswap αexpansion Initial labeling Standard move

D V V V V D’ 4-connected structure • Most common graph/MRF(BP) structure in stereo 2-variable Graph-Cut Source α Observable nodes D V V V V Hidden nodes α’ Sink MRF in Belief Propagation D,V are vectors

Hierarchical Exhaustive Search on Presenter: Nelson Chang Institute of Electronics, National Chiao Tung University

Outline • Combinatorial Optimization • Graph-Cut • Exhaustive Search • Iterated Conditional Modes • Hierarchical Exhaustive Search • Result • Summary & Next Step

0 0 0 0 99 92 101 100 ? ? ? ? 10 10 10 0 1 2 3 100 79 98 114 1 1 1 1 Combinatorial Optimization • Determine a combination (pattern, set of labels) such that the energy of this combination is minimum • Example: 4-bit binary label problem • Find a label-set which yields the minimal energy • Each individual bit can be set as 0 or 1 • Each label corresponds to an energy cost • Each neighboring bit pair is better to have the same label (smoothness) Energy(0000) = 99+92+100+101 = 392 Energy(0001) = = 99+92+100+98+10 = 399

1100 101 99 100 92 79 114 100 98 Graph-Cut • Formulate the previous problem into a graph-cut problem • Find the cut with minimum total capacity (cost, energy) • Solving the graph-cut: Ford-Fulkurson Method 0 3 13 12 2 ? ? ? ? 10 10 10 9 7 0 1 2 3 14 4 1 1 1 Total Flow Pushed = 99+79+100+98 +1 +10 +3 =390 Max Flow (Energy of the cut 1100)

0 0 0 0 99 92 101 100 10 10 10 ? ? ? ? 0 1 2 3 100 79 98 114 1 1 1 1 Exhaustive Search • List all the combinations and corresponding energy • Example: 1100 has the minimal energy of 390

0 0 0 0 99 92 101 100 10 10 10 ? ? ? ? 0 1 2 3 100 79 98 114 1 1 1 1 Iterated Conditional Modes • Iteratively finds the best label under the current given condition • Greedy • Different starting decision (initial condition) result in different result • Can find local minima • Example: • Start with bit 1 because it is more reliable • Iteration order: bit1bit0bit2bit3 • Final solution: 1100 0 0 1 1 2 3 0 1 100(1)<99+10(0)  1 79(1)<92(0)  1 100+10(0)<114 (1)  0 101(0)<98+10(1)  0

Exhaustive Search Engine • Exhaustive search can be hardware implemented • Less sequential dependency • Not suitable for graph larger than 4x4 Result of fully connected graph, NOT 4-connected graph

0 1 2 3 Hierarchical Graph-Cut? • Solve large n graph with multiple small n GCE hierarchically • Example: • Solve n=16 with 4+1 n=4 graph-cuts For each sub-graph, find the best 2 label-sets Sub-graph 0 Sub-graph 1 For each sub-graph vertice Label 0 = 1st label set Label 1 = 2nd label set Assumption: !! The optimal solution must be within the combinations of sub-graph label sets !! Sub-graph 2 Sub-graph 3

HGC Speed up Evaluation • For an 8-point GCE with 8-set of ECUs • Cost: 300 eq. adders • Latency: 41 cycles per graph • If only 1 GCE is used to compute 64-point 2 variable graph-cut Latency = 41 cycles x 8 + 41 cycles + TV = 369 cycles + TV If V is computed for each pixels Tv=(8x8)X(8x7/2)X4=3584 Total Latency ~ 3953 cycles Question: Is this solution the optimal label set for n=64???

Hierarchical Exhaustive Search pat0 is the best candidate pattern pat1 is 2nd best candidate pattern • 64x64 nodes • 4x4 based pyramid structure • 3 levels Level 2 D@lv2 E0/E1@lv1 Label0@lv2 pat0@lv1 Label1@lv2 pat1@lv1 Level 1 D@lv1 E0/E1@lv0 Label0@lv1 pat0@lv0 Label1@lv1 pat1@lv0 Level 0 D@lv0 D0/D1@lv0 Label0@lv0 Label0 Label1@lv0 Label1

Computing V term at Level 1 • For 1st order neighboring sub-graphs Gi and Gj • possible neighboring pair combination • (pat0i, pat0j) • (pat0i, pat1j) • (pat1i, pat0j) • (pat1i, pat1j) • Compute V(patXi,patXj) with original neighboring cost • Example: • V(pat0i, pat0j) = K • V(pat0i, pat1j) = K+K+K = 3K Gi Gj pat0i pat0j ? ? ? 0 0 ? ? ? ? ? ? 0 0 ? ? ? ? ? ? 0 1 ? ? ? ? ? ? 1 1 ? ? ? pat0i pat1j ? ? ? 0 1 ? ? ? ? ? ? 0 0 ? ? ? ? ? ? 0 1 ? ? ? ? ? ? 1 0 ? ? ?

Result of 16x16 (256) 2 level HES • Random generated 100 graphs • D/V~ 10 • Symmetric V=20 • Error Rate • Max: 17/256 ~ 6.6% • Average: 7/256 ~ 2.8% • Min: 2/256 ~ 0.8%

Result of 64x64 (4096) 3 level HES • Random generated 100 graphs • D/V~ 10 • Symmetric V=20 • Error Rate • Max: 185/4096 ~ 4.5% • Average: 146/4096 ~ 3.6% • Min: 115/4096 ~ 2.8%

Death Sentence to HES Presenter: Nelson Chang Institute of Electronics, National Chiao Tung University

Error Rate vs. Graph Size Error rate range became smaller • (D,V)=(~163:20) 3.63 vs. 3.65 Error rate did not increase significantly

256x256 1 pattern result Impact of different V cost • 64x64(3 level) HES • 100 patterns per V cost value • D cost (average over s-link caps of 10 patterns, 2 for each V) • Average: 162.8 • Std.Dev: 94.4 • V cost • 10, 20, 40, 60, 80

Stereo Matching Case • Stereo Pair: Tsukuba • Expansion with random label order • 15 labels  15 graph-cut computations • Graph Size: 256 x 256 • D term: truncated Sum of Squared Error (tSSE) • Truncated at AD=20 • V term: Potts model • K=20

1st iteration result 5 BnK’s expansion result 4 • Error rate might exceed 20% for important expansion moves 9 Important expansions

Reason for failure • Best 2 local candidates does NOT include the final optimal solution • Error often happen near lv2 and lv3 block boundary • Majority node has both 0 source and sink link capacity • More dependent on neighboring node’s label • D:V ratio ~ 56:20  2.8:1 • Similar to D:V = 163:60 case • Error rate for random pattern ~ 15% Best 2 patterns in does NOT consider the pattern of

Partitioned (Block) Graph-Cut Presenter: Nelson Chang Institute of Electronics, National Chiao Tung University

Motivation • Global • Considers the whole picture • More information • Local • Considers a limited region of a picture • Less information Is it necessary to use that much information in global methods??

Concept • Original full GC • 1 big graph • Partitioned GC • N smaller graphs What’s the smallest possible partition to achieve the same performance?

Experiment Setting • Energy • D term • Luma only • Birchfield-Tomasi cost (best result at half-pel position) • Square Error • V term • Potts Model V= K x T(di≠dj) • K constant is the same for all partition • Partition Size • 4x4, 16x16, 32x32, 64x64, 128x128 • Stereo Pairs • Tsukuba, Teddy, Cones, Venus

Tsukuba 4x4, 16x16, 32x32, 64x64 4x4 16x16 64x64 32x32

Tsukuba 96x96, 128x128 Full GC 128x128 96x96

Venus 32x32, 64x64 64x64 32x32

Venus 96x96, 128x128 Full GC 96x96 128x128

Teddy 32x32, 64x64 64x64 32x32

Teddy 96x96, 128x128 Full GC 96x96 128x128

Cones 32x32, 64x64 64x64 32x32

Cones 96x96, 128x128 Full GC 96x96 128x128

Middleburry Result Evaluation Web Page http://cat.middlebury.edu/stereo/ Best: Full GC with best parameter Full: Full GC with k=20(tsukuba) and 60 (others)

Summary • Smallest possible partition size (2% accuracy drop) • Tuskuba64x64 • Teddy & Cones  96x96 • Venus  larger than 128x128 • Benefits • Possible complexity or storage reduction • Parallelism increase • Drawbacks • Performance (disparity accuracy) drop • PC computation becomes longer

Hierarchical Parallel Graph-Cut Presenter: Nelson Chang Institute of Electronics, National Chiao Tung University

Concept of Hierarchical Parallel GC • Bottom Up • Solve graph-cut for smaller subgraphs • Solve graph-cut for larger subgraphs • Larger subgraphs = set of neighboring smaller subgraphs !!Each subgraph is temporary independent !! Larger subgraph = sg0+sg1+sg2+sg3 sg0 sg1 Level 0 Level 1 sg2 sg3

HPGC for solving a 256x256 graph Step 1 64 32x32 Lv0 subgraphs Step 2 16 64x64 Lv1 subgraphs Step 3 4 128x128 Lv2 subgraphs Step 4 1 256x256 Lv3 subgraphs Total graph-cut computations = 64+16+4+1 =85 !!HPGC must used Ford-Fulkerson-based methods!!

Boykov and Kolmogorov’s Motivation 1 1 1 • Dinic Method • Search the shortest augmenting path • Use Breadth First Search (BFS) • Example: • Search shortest path (length = k) • Use BFS, expand the search tree • Find all paths of length k • Search shortest path (length = k+1), • Use BFS, RE-expand the search tree again • Find all paths of length (k+1) • Search shortest path (length = k+2), • Use BFS, RE-RE-expand the search tree again • ….. 1 1 1 1 1 1 1 Why don’t we REUSE the expanded tree?

BnK’s Method • Concept: • Reuse the already expanded trees • Avoid re-expanding the tress from scratch (nothing) • 3 stages • Growth • Grow the search tree • Augmentation • Ford-Fulkerson style augmentation • Adoption • Reconnect the unconnected sub-trees • Connect the orphans to a new parent Augmenting Path Saturate Critical Edge Adopt Orphans

Feature of BnK method • Based on Ford-Fulkerson • Bidirection search tree constructon • Searched tree reuse • Determine label (source or sink) using tree connectivity Source tree Sink tree

Research on Graph-Cut for Stereo Vision

Research on Graph-Cut for Stereo Vision

Presentation Transcript

Fundamentals of Stereo Vision

Stereo Vision

Stereo Vision Project III

Stereo vision

Graph cut

Stereo Vision

Graph-Cut / Normalized Cut segmentation

Graph Cut Algorithms for Binocular Stereo with Occlusions

Hardware Acceleration for Stereo-Vision Algorithms

Graph Cut Selections

Stereo Vision System

Graph Cut

Graph Cut

Stereo Vision

Graph Cut

Binocular Stereo Vision

Binocular Stereo Vision

Stereo Vision

Omnidirectional Stereo Vision

Stereo Vision

Graph Cut Selections