460 likes | 651 Views
LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model-based segmentation. John Winn Microsoft Research Cambridge with Nebojsa Jojic, MSR Redmond. 7 th July 2006. Overview. Learning object models The LOCUS model Experiments & results
E N D
LOCUS(Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model-based segmentation. John WinnMicrosoft Research Cambridge with Nebojsa Jojic, MSR Redmond 7th July 2006
Overview • Learning object models • The LOCUS model • Experiments & results • Extensions to LOCUS
Goal Long Term Goal Recognise ~10,000 object classes.
Learning from ‘buckets’ of images Learningalgorithm Horsemodel • Object Segmentation • Object Recognition • Object Detection
+ Object segmentation LOCUS Horsemodel
Constellation models • Weakly supervised • Probabilistic framework • Sparse • No segmentation Object class recognition by unsupervised scale-invariant learning. R. Fergus, P. Perona, and A. Zisserman. CVPR 2003 A Bayesian approach to unsupervised One-Shot learning of Object categories. L. Fei-Fei, R. Fergus, and P. Perona. ICCV 2003
Fragment-based • Dense model • Supervised • Non-probabilistic • No global shape model Learning to segment. E. Borenstein and S. Ullman. ECCV 2004 Combining top-down and bottom-up segmentation. E. Borenstein, E. Sharon, and S. Ullman. CVPR 2004
Codebook-based • Probabilistic • Dense model • Supervised • Ad-hoc inference Combined object categorization and segmentation with an implicit shape model. B. Leibe, A. Leonardis, and B. Schiele. ECCV ‘04
OBJ CUT • Probabilistic • Dense model • Supervised • Requires video
LOCUS overview • Weakly supervised learning Buckets of images - no annotation required. • Probabilistic generative modelof both object and background. • Dense modelAll pixels modelled, not just at interest points. • Combines global and local cuesModels global shape and local appearance + edges. • Iterative inference processSimultaneous localisation, segmentation, pose estimation.
LOCUS model Shared between images Class shape π Class edge sprite μo,σo Deformation field D Position & size T Different for each image Mask m Edge image e Background appearance λ0 Object appearance λ1 Image
background Background mixture coefficients Objectmixture coefficients object λ0 λ1 Shared mixture components: LOCUS model: appearance Mask m Image z
favours segmentation along contrast edges LOCUS model: mask background object 8-neighbour Markov Random Field (as used in GrabCut)
Class shapeπ Transformation TN T4 T2 T3 T1 … LOCUS model: shape/position …
TN T1 Iterative inference Class shapeπ Iteration #1 T2 T3 T4 … …
TN T1 Iterative inference Class shapeπ Iteration #2 T2 T3 T4 … …
TN T1 Iterative inference Class shapeπ Iteration #3 T2 T3 T4 … …
TN T1 Iterative inference Class shapeπ Iteration #5 T2 T3 T4 … …
TN T1 Iterative inference Class shapeπ Iteration #8 T2 T3 T4 … …
TN T1 Iterative inference Class shapeπ Iteration #12 T2 T3 T4 … …
Non-rigid objects Class shapeπ Translation and scale is not enough.
Deformation field D 5x5 blocks T Prior ensures smoothness LOCUS model: pose Class shapeπ
TDN TD1 TD2 TD3 … LOCUS model: pose Class shapeπ …
Class edge sprite μo,σo TDN TD1 TD2 TD3 … Edge images e … LOCUS model: edge Original images …
LOCUS model: overview Shared between images Class shape π Class edge sprite μo,σo Deformation field D Position & size T Different for each image Mask m Edge image e Background appearance λ0 Object appearance λ1 Image
Inference • Aim to infer all latent variables, • For each image:background appearance λ0, object appearanceλ1, deformation D, transformation T, mask m, • Class variables: shape π, edge sprite μo, σo. • Bayesian inference is carried out using variational message passing with a fully factorised variational distribution. • Optimisation of grid-structured variational free energy terms (relating to the deformation field D and the mask m) achieved using graph cuts.
Experiments LOCUS applied to 8 sets of 20 images each containing objects of the same class. • Horses • Faces • Cars (rear) • Cars (side) • Motorbikes • Aeroplanes • Cows • Trees For each class, we ran separate experiments for color and texture appearance models.
Faces Cars (rear) Motorbikes Planes Cows Trees Results: remaining classes
Segmentation accuracy To evaluate segmentation quantitively, we used hand segmentations for horses and cars (side).
Object registration Transformation + deformation field registers object outlines (and some internal edges).
Recognition + segmentation Object recognition using only global shape: Overall: 88% accuracy.
Probabilistic Index Maps 2 indices 9 indices Each image has a ‘palette’ of appearance models – palette invariance.
Learning objects from video Object shape Object edge sprite
Locumotion Add flow and track constraints to achieve motion segmentation: Tracking/flow estimation by Larry Zitnick
Conclusions • LOCUS gives unsupervised segmentations of accuracy equivalent to state-of-the-art supervised methods. • General-purpose model allows: • Object localisation • Pose estimation • Object segmentation • Motion segmentation/object tracking • Object recognition/detection (in combination with discriminative model)