LOCUS (Learning Object Classes with Unsupervised Segmentation)

LOCUS(Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model-based segmentation. John WinnMicrosoft Research Cambridge with Nebojsa Jojic, MSR Redmond 7th July 2006

Overview • Learning object models • The LOCUS model • Experiments & results • Extensions to LOCUS

Goal Long Term Goal Recognise ~10,000 object classes.

Learning from ‘buckets’ of images Learningalgorithm Horsemodel • Object Segmentation • Object Recognition • Object Detection

+ Object segmentation LOCUS Horsemodel

Related work

Constellation models • Weakly supervised • Probabilistic framework • Sparse • No segmentation Object class recognition by unsupervised scale-invariant learning. R. Fergus, P. Perona, and A. Zisserman. CVPR 2003 A Bayesian approach to unsupervised One-Shot learning of Object categories. L. Fei-Fei, R. Fergus, and P. Perona. ICCV 2003

Fragment-based • Dense model • Supervised • Non-probabilistic • No global shape model Learning to segment. E. Borenstein and S. Ullman. ECCV 2004 Combining top-down and bottom-up segmentation. E. Borenstein, E. Sharon, and S. Ullman. CVPR 2004

Codebook-based • Probabilistic • Dense model • Supervised • Ad-hoc inference Combined object categorization and segmentation with an implicit shape model. B. Leibe, A. Leonardis, and B. Schiele. ECCV ‘04

OBJ CUT • Probabilistic • Dense model • Supervised • Requires video

LOCUS overview • Weakly supervised learning Buckets of images - no annotation required. • Probabilistic generative modelof both object and background. • Dense modelAll pixels modelled, not just at interest points. • Combines global and local cuesModels global shape and local appearance + edges. • Iterative inference processSimultaneous localisation, segmentation, pose estimation.

The LOCUS model

LOCUS model Shared between images Class shape π Class edge sprite μo,σo Deformation field D Position & size T Different for each image Mask m Edge image e Background appearance λ0 Object appearance λ1 Image

background Background mixture coefficients Objectmixture coefficients object λ0 λ1 Shared mixture components: LOCUS model: appearance Mask m Image z

favours segmentation along contrast edges LOCUS model: mask background object 8-neighbour Markov Random Field (as used in GrabCut)

Class shapeπ Transformation TN T4 T2 T3 T1 … LOCUS model: shape/position …

TN T1 Iterative inference Class shapeπ Iteration #1 T2 T3 T4 … …

Non-rigid objects Class shapeπ Translation and scale is not enough.

Deformation field D 5x5 blocks T Prior ensures smoothness LOCUS model: pose Class shapeπ

TDN TD1 TD2 TD3 … LOCUS model: pose Class shapeπ …

Class edge sprite μo,σo TDN TD1 TD2 TD3 … Edge images e … LOCUS model: edge Original images …

LOCUS model: overview Shared between images Class shape π Class edge sprite μo,σo Deformation field D Position & size T Different for each image Mask m Edge image e Background appearance λ0 Object appearance λ1 Image

Inference • Aim to infer all latent variables, • For each image:background appearance λ0, object appearanceλ1, deformation D, transformation T, mask m, • Class variables: shape π, edge sprite μo, σo. • Bayesian inference is carried out using variational message passing with a fully factorised variational distribution. • Optimisation of grid-structured variational free energy terms (relating to the deformation field D and the mask m) achieved using graph cuts.

Experiments & results

Experiments LOCUS applied to 8 sets of 20 images each containing objects of the same class. • Horses • Faces • Cars (rear) • Cars (side) • Motorbikes • Aeroplanes • Cows • Trees For each class, we ran separate experiments for color and texture appearance models.

Results: horses

Results: cars

Faces Cars (rear) Motorbikes Planes Cows Trees Results: remaining classes

Segmentation accuracy To evaluate segmentation quantitively, we used hand segmentations for horses and cars (side).

Object registration Transformation + deformation field registers object outlines (and some internal edges).

Object registration

Extensions to LOCUS

Recognition + segmentation Object recognition using only global shape: Overall: 88% accuracy.

Probabilistic Index Maps 2 indices 9 indices Each image has a ‘palette’ of appearance models – palette invariance.

Probabilistic Index Maps

Learning objects from video Object shape Object edge sprite

Locumotion Add flow and track constraints to achieve motion segmentation: Tracking/flow estimation by Larry Zitnick

Conclusions • LOCUS gives unsupervised segmentations of accuracy equivalent to state-of-the-art supervised methods. • General-purpose model allows: • Object localisation • Pose estimation • Object segmentation • Motion segmentation/object tracking • Object recognition/detection (in combination with discriminative model)

Questions ?

LOCUS (Learning Object Classes with Unsupervised Segmentation)

LOCUS (Learning Object Classes with Unsupervised Segmentation)

Presentation Transcript

Unsupervised Morphological Segmentation With Log-Linear Models

Chapter 5 Unsupervised learning

Clustering

Unsupervised Learning and Clustering

Clustering

Andrew N. Stein∗ Thomas S. Stepleton Martial Hebert

Unsupervised Morphemic Segmentation

An opposition to LOCUS

Levels of supervision for training object category models

Image segmentation

Dendrograms for Data Mining

Unsupervised Learning of Visual Object Categories

Machine Learning

Clustering

Learning

Organizing data into classes such that there is high intra-class similarity

Learning

Word Segmentation Models: Overview

Unsupervised Learning and Image Search

Levels of supervision for training object category models

Chapter 5 Unsupervised learning

Object Recognizing