240 likes | 251 Views
Explore a thorough analysis of robust place and object recognition using local appearance-based methods by Gregory Dudek and Deeptiman Jugessur from McGill University. Discover the shortcomings, objectives, approach, background, and results of their PCA-based recognition models. Gain insights into their key ideas, focused on improving performance by incorporating varying backgrounds, planar rotations, occlusions, and more. Uncover how sub-windows and attention operators uniquely enhance recognition accuracy in this innovative approach.
E N D
+ Robust Place and Object Recognition using Local Appearance based Methods Gregory Dudek and Deeptiman Jugessur Center for Intelligent Machines McGill University Dudek & Jugessur
Outline • Applications • PCA: shortcomings • Objectives • Approach • Background • System Overview • Results • Conclusion Dudek & Jugessur
Two Applications • Object recognition: what is that thing? • Recognizing a known object from its visual appearance. • Landmarks, grasping targets, etc. • Place recognition (coarse localization): what room am I in? • Recognizing the current waypoint on a trajectory, validating the current locale for the application of a precise localization method, topological navigation. Dudek & Jugessur
PCA-based recognition. • Has now become a well established method for image recognition. • PCA-based recognition: global transform of image with N degrees of freedom into an eigenspace with M << N degrees of freedom. • Freedoms M are the “most important” characteristics of the set of images being memorized. • Avoids having to segment image into object & background by using the whole thing. Dudek & Jugessur
Observations • Using whole image implies recognizing combination of object AND background. • Segmenting object from background would avoid dependence on background, but it’s too difficult. • Using a small sub-region gives a less precise recognition (e.e. the sun-window could come from more than one image), it’s is efficient. • Many subwindows together can “vote” for an unambiguous recognition. • If the sub-windows are suitably chosen, they may totally ignore the background. Dudek & Jugessur
Problem Statement • Improving the performance of classic PCA based recognition by accounting for: • Varying backgrounds • Planar rotations • Occlusions • Also (discussed in less detail) • Changes in object pose • Non-rigid deformation Dudek & Jugessur
Our key idea(s). • Use sub-windows: several together uniquely accomplish recognition. • Sub-windows are selected by an attention operator (several kinds can be used). • Each sub-window is sampled non-uniformly to weight it towards it’s center. • Use only the amplitude spectrum to buy rotational invariance. Dudek & Jugessur
Background • Standard Appearance Based Recognition • M. Turk and S. Pentland 1991 • S.K. Nayar, H. Murase, S.A. Nene 1994 • H. Murase, S.K. Nayar 1995 • Shortcomings (due to global approach): • Background • Scale • Rotations • Local changes of the image or object • Occlusion Dudek & Jugessur
Background (part 2) • “Enhanced” Local sub-window methods • D. Lowe 1999: scale invariance, simple features. • C. Schmid 1999: Probabilistic approach based on sub-windows extracted using Harris operator. • C. Schmid & R. Mohr 1997: numerous sub-windows extracted using Harris operator for database image retrieval (simpler problem). • K. Ohba & K. Ikeuchi 1997: K.L.T. operator used for the extraction of sub-windows for the creation of an eigenspace. Only handles occlusion. • Interest Operator of choice: • D. Reisfeld, H. Wolfson, Y.Yeshurun 1995: Local symmetry operator Dudek & Jugessur
Approach • 2 phases: • Training (off-line) for the entire database of recognizable images: • Run an interest operator to obtain a saliency map for each image. • Choose sub-windows around the salient points for each image. • Select most informative sub-windows and use foveal sampling. • Create the eigenspace with the processed sub-windows. • Testing (on-line) for a candidate test image: • Run the same interest operator to obtain the saliency map. • Choose the sub-windows and process the information within them. • Project the sub-windows onto the eigenspace • Perform classification based on nearest neighbor rules. Dudek & Jugessur
Recognition Model Database of recognizable images Run all images though the interest operator Create low dim. eigenspace Extract sub-windows based on interest operator saliency values and information content 2D FFT Obtain amplitude spectra for the sub-windows Eigenspace for classification Candidate test image Project onto eigenspace 2D FFT Run the image through the interest operator Off-line On-line Dudek & Jugessur
Polar Samplings and 2D FFT Polar Sampling Polar Sampling Same Amplitude Spectrum (in theory) 2D FFT 2D FFT Dudek & Jugessur
Shift Theorem Dudek & Jugessur
Training Images Best match Best match Place Recognition Test Images Dudek & Jugessur
Training Images Best match Best match Place Recognition (2) Test Images Dudek & Jugessur
Training Image Recognition Object Recognition Test Image Dudek & Jugessur
Object Recognition (2) Test Image Training Image Best matches Note: background variation and occlusion Dudek & Jugessur
Performance metrics • On-line performance: • 15x15 pixel subwindows: 90% recognition with 10 subwindows (10 interest points). • 15x15 pixel subwindows: 100% recognition using 15 more subwindows • Interest operator can take 1/30s to 10 min. (depending on the operator, images size, etc.). • Classification in Eigenspace well under 1 sec (can be performed in real time). Dudek & Jugessur
Performance vs Number of Interest Points 100% Note: 10 windows of size 15x15 means using only 0.7% of the total image content. Recognition Rate Number of features Dudek & Jugessur
Conclusion & Extensions • Approach to object and place recognition from single video images. Works despite planar rotation, occlusion or other deformations. • Highly robust. • Recognition rates of up to 100% with 20 test images. • Improved robustness to background can be achieved using “masking” [Jugessur & Dudek CVPR 2000]. • Ongoing work sees to exploit geometry of interest points. • Could filter in Eigenspace during training to select only “useful” features. Dudek & Jugessur
That’s all Dudek & Jugessur
Questions you could ask • Have you considered the use of alternative interest/attention operators? Does the operator matter? • What if the background is much more interesting (to the operator) that the object? • How much does color information matter? • What is the consequence of not using geometric information (and what does that really mean)? Dudek & Jugessur
Performance metrics • Training time: roughly 64 windows, 15x15, 17 objects, 3 views per object: 24 hours. • This is using MATLAB and highly non-optimized code. • Using similar methods on global images, other groups have reported times on the order of minutes for similar tasks. • On-line performance: • Interest operator can take 1/30s to 10 min. (depending on the operator, images size, etc.) • Classification in Eigenspace well under 1 sec (can be performed in real time). Dudek & Jugessur