360 likes | 459 Views
Towards Sublinear Time Multiclass Object Detection. Sam Davies. The Challenge. Recognize objects in images Many object classes Many 3D views Feasible on consumer hardware. Applications. Cars that drive themselves Other robots… Assistive devices for the blind. This Talk.
E N D
Towards Sublinear TimeMulticlass Object Detection Sam Davies
The Challenge • Recognize objects in images • Many object classes • Many 3D views • Feasible on consumer hardware
Applications • Cars that drive themselves • Other robots… • Assistive devices for the blind
This Talk • Use an existing object representation [Crandall ’05] • Propose a faster detection algorithm • equivalent accuracy • Present initial experiments that suggest • It scales well with #classes x #views • Empirically sublinear
Talk Overview • Past Work • Part-based detection • 1-Fan/Star Model • Proposed Algorithm • Results • Next Steps • Feature Sharing
Past Work: State of the Art • Part-based • Shape • Appearance • Relatively high accuracy • (for this presentation, assume good enough) • Mostly single view, single class • Linear running time in C (#classes x #views) • (or parallelize with N processors -- $$$!) • Multiclass part sharing [Torralba 2004] • Improve running time – empirically O(log C) • Restricted shape model
Past Work: Part-Based Detection • Rigid pieces held together by “springs.” • The springs joining the rigid pieces • Constrain relative movement • Measure the cost of the movement • Cost of an embedding: • Measure the “tension” on each spring, and • A local evaluation of how well each coherent piece is embedded [Fischler, Elschlager 1973]
Past Work: Part-Based Detection • Global measurement (shape) • Constellation / arrangement of part positions • Spring stretching / compressing • Cost / energy associated with relative positions of pairs of parts • Local measurement (appearance) • Rigid local part from image information • Independently measured for each part
Past Work: Part-Based Detection • Find best location of all the parts (highest sum of weighted votes) • minimize spring tension and part matching energies • MAP estimation: maximum probability of part locations for a test image
Past Work: 1-Fan/Star Model • Restrict all parts to only be connected to the center part
Past Work: 1-Fan/Star Model • Restrict all parts to only be connected to the center part • More efficient detection (dynamic programming) • Shown to be reasonably accurate [Crandall 2005, Fergus 2005]
Past Work: 1-Fan/Star Model • Hough Transform • Each part “votes” for location of the center part • Votes are weighted according to spring definitions
Past Work: 1-Fan/Star Model Use Gaussians for shape models [Crandall 2005, Fergus 2005]
Past Work: 1-Fan/Star Model O(N) O(N) + O(N2) O(N) O(N) x O(P) O(PN) + O(PN) (sum) + O(N) (max) O(PN) x O(C) O(CPN) N: # pixels P: # parts C: # classes x # views
Proposed Algorithm • Idea: • Run max, sum, distance transform computations all together • Adaptively • Divide into image pyramids
Proposed Algorithm • Key observation: • We can quickly calculate an upper bound of the distance transform in a desired image pyramid cell • Then refine in the most promising areas
Proposed Algorithm • Start with a coarse approximation • Ignore shape information all together • Think: largest cell in the image pyramid groups all pixels into one • Equivalent to bag-of-words (0-fan)
Proposed Algorithm • For the object that looks most promising, descend down to a finer resolution in the hierarchy, and re-estimate the distance transform. • Based on a hierarchical A* framework [Macallester ’07] • Admissible heuristic based on upper bound estimate for coarse estimates
Next Steps • Recall: • Appearance correlation is still O(PC) • P = # parts, C = #classes x # views • Even if shape matching is sublinear, we still have: O(PC) + o(C) = O(PC) • Need to make correlation sublinear as well.
Past Work: Feature Sharing [Torralba 2004]
Past Work: Feature Sharing empirically “O(log(C))”
Next Steps • Combine • Sublinear appearance correlation (via feature sharing) with • Sublinear shape searching (described here) • We get: • o(C) + o(C) = o(C)