170 likes | 283 Views
BEYOND SLIDING WINDOW:. Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann. Object Localization by Efficient Subwindow Search. Motivations. To localize the object without exhaustive search observation : often, only a small portion of the image contains the object of interest
E N D
BEYOND SLIDING WINDOW: Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann Object Localization by Efficient Subwindow Search
Motivations • To localize the object without exhaustive search • observation : often, only a small portion of the image contains the object of interest • To find a global optimum in a huge search space • Object detection and retrieval
Contributions • Efficient (n^2 VS n^4) • n^4 rectangles for an image n X n • n X n possible centers • n possible choices for width & n for height • n^4 rectangles • Optimal • Versatile • arbitrary objects VS simple parametric objects in line drawings [4] • flexible in the choice of the cost function VS L2 distance [13] • Challenge • To find optimal and tight bounds
Branch and Bound • first proposed by A. H. Land and A. G. Doig in 1960 for linear programming • a “divide and conquer” approach to optimize some cost function f(x) • recursively branching & bounding • split S into subsets Si that min(f(x)) = min(vi) • compute the lower & upper bounds of f(x) within Si • pruning
Methodology • Cost function • Parameter space • Bounds
Bounding I • a bag of visual words for non-rigid objects • histograms of SIFT prototypes • SVM decision function • bounds • get the maximal amount of + and minimal amount of – • integral image makes evaluation O(1) ,
Results • PASCAL VOC 06 • 5,304 images with 9,507 objects from 10 categories • 1000 visual words from 50,000 SURF descriptors • claim a match when > 50% overlap between the detected bounding box and the ground truth • PASCAL VOC 2007 • 9,963 images with 24,640 objects
Speed • 40ms per image on a 2.4 GHz PC
Bounding II • spatial pyramid for rigid objects • histograms with spatial information • Extensions with ESS (fine-grained pyramids) • SVM decision function
Results • UIUC Car database (side-view, one car per image) • 1050 training (550 positive images) • 277 test (170 single scale + 107 multi scale) • 1000 visual words from 50,000 SURF descriptors
Image part retrieval • query-by-example • localized similarity measure • bounds
Results • 10143 keyframes of a movie • return 100 most relevant images for a query • 2s per returned image
Conclusions • high speed with global optimum • can be extended to multi-detections, other shapes, different cost functions