10 likes | 106 Views
4. Key idea #1. 10. Key idea #2. 1. Introduction. 12. Results. Use the 3D model to predict the location of limb boundaries in the scene. Compute various filter responses steered to the predicted orientation of the limb.
E N D
4. Key idea #1 10. Key idea #2 1. Introduction 12. Results • Use the 3D model to predict the location of limb boundaries in the scene. • Compute various filter responses steered to the predicted orientation of the limb. • Compute likelihood of filter responses given 3D model, using a statistical model learnedfrom examples. • Goal: Tracking and reconstruction of people in 3D from a monocular video sequence. Articulated human model. • Approach:Learn models of the appearance of humans in image sequences. • What are the image statistics of human limbs? How do they differ for general scenes? Posterior Likelihood Temporal prior Steered edge responses • Posterior propagated with Condensation (Isard & Blake 96) • Likelihood independence assumption: Scale specific! Steered ridge response: 5. Learning Distributions Tracking is improved using multiple cues (see video) |2nd derivative| arm |2nd derivative| // arm Last frame, Edge Ridge Motion All Training data: 2. Why is it Hard? 2D-3D projection ambiguities Tracking with self occlusion (see video) Last frame Pon, Poff Pon/Poff 8. Ridge 13. Conclusions Occlusion Foreground Background • People have different appearance due to clothing, shape etc. • Need to discriminate between “people” and “non-people”, but allow for variation in human appearance • Generic, learned, model of appearance Combines multiple cues, exploits work on image statistics • Use the 3D model to predict position and orientation of features • Model of foreground and background Exploits the ratio between foreground and background likelihood, improves tracking Statistical differences. Pon/Poff can be used for discriminating foreground from background. Training data: Consecutive images from sequence x x+u 11. Bayesian Formulation Similarity between different limbs Unexpected position Motion response = I(x, t+1) - I(x+u,t) Pon = probability of filter response on limbs Poff = probability of filter response on general background Similar to Konishi et al. 00, but object specific - responses may be present or not. Poff Pon 6. Normalized Derivatives Edge Detection: 3. Previous Approaches 9. Motion 14. Related Work Statistics of natural images: S. Konishi, A. Yuille, J. Coughlan, and S. Zhu. Fundamental bounds on edge detection: An information theoretic evaluation of different edge cues. submitted: PAMI T. Lindeberg. Edge detection and ridge detection with automatic scale selection. IJCV, 30(2):117-156, 1998 J. Sullivan, A. Blake, and J. Rittscher. Statistical foreground modelling for object localization. In ECCV, pp 307-323, 2000 S. Zhu and D. Mumford. Prior learning and Gibbs reaction-diffusion. PAMI, 19(11), 1997 Tracking: J. Deutscher, A. Blake, and I. Reid. Articulated motion capture by annealed particle filtering. In CVPR, vol 2, pp 126-133, 2000 M. Isard and J. MacCormick. BraMBLe: A Bayesian multi-blob tracker. In ICCV, 2001 M. Isard and A. Blake. Contour tracking by stochastic propagation of conditional density. In ECCV, pp343-356, 1996 Image derivatives fx, fy, fxx, fxy, fyynormalized with a non-linear transfer function. High 1, low 0, • “Explain” the entire image • Konishi et al. 00: Parameters of transfer function optimized to maximize Bhattacharyya distance between Pon and Poff.: p(image | foreground,background) • Probabilistic model? • Under/over-segmentation, thresholds, ... Background Foreground Image flow, templates, color, stereo, ... p(image | foreground,background) = Vertical derivative Normalized vertical derivative p(fgr part of image | foreground) Instead, model the appearance of people in terms of filter responses const p(fgr part of image | background) Learning Image Statistics for Bayesian Tracking Hedvig Sidenbladh KTH, Sweden hedvig@nada.kth.se Michael Black Brown University, RI, USA black@cs.brown.edu 7. Edge Steered edge response: Pon, Poff Pon/Poff