Outline

Outline • Scene Understanding • Motivation for Context • Fixed order model (and constellation) • Known bag of objects WITH context • Independent TDP • Unknown bag of objects WITHOUT context • CASPER Distribution • Preliminary Experiments • Experimental Plan

Scene Understanding - JL • Localize and classify all objects in an image • Not necessarily “segment” at the pixel level • A “description” of a scene is a pair of vectors l and rho where l gives the class for each instance and rho gives the location • We are trying to find P(l,rho)

Why Context? • Represent “context” • Show LOOPS picture • How do you use context? • (l,rho) version

Fixed Order Model • Known set of objects • Joint Gaussian over all centroids • P(l,rho) = 1{l == l_fixed_order} P(rho | l) • Problems: • We don't always know the exact set of objects • Facebook example • What if there are two instances from the same class? • Bedroom scene example

TDP • Unknown set of objects • Gaussian over centroid for each • Centroids are independent • P(l,rho) = P(l) prod_i P(rho_i | l_i) • Problems: • This doesn't take pairwise constraints into account • We have lost context

CASPER • Unknown set of objects • Joint Gaussian over centroids given ANY set • P(l,rho) = P(l) P(rho | l) • Questions: • How do we represent P(l)? • How do we represent P(rho | l)? • How do we learn • How do we infer

P(l) • Options: • Dirichlet Process • IID Multinomial • Other smart things

P(rho | l) - GH • Desiderata: • Correlations between rho's • Sharing of parameters between l's • ... • Options: • Independent • Learn a different Gaussian for every l • Can't share parameters, large number of l's • Gaussian Process • Correlation is the not natural space to represent these relationships • Product of Experts • Each “expert” represents a Gaussian offset between objects • This is where we have spent the most time

CASPER P(rho|l) - JL • Some math and examples: • P(rho,d|l) = 1/Z prod_ij P_c(rho_i-rho_j)^dijc • P(d|l) = Multinomial • P(rho|d,l) = Gaussian • Precision space view

Learning the Experts • Training set with (l,rho) pairs per image • Gibbs over the hidden variables: • Graph for the image (d's) • Loop over edges

Generative Process • Show some synthetic generated images

Preliminary Experiments - GH • Bedroom and Streets Scenes from LabelMe: • Features SIFT • Features (x,w)

Learning/Inference in Full Model • Three stage Gibbs: • Features to Instances • Black box: previous algorithm • Graph for the image (d's) • Loop over edges • Instances to Classes • Training • Supervise Feature to instance and instance to class assignments • Testing • Introduce new images and gibbs away

Results • Show Sucky Pictures

Problems • It's hard to evaluate • If your goal is detection you lose to discriminative approaches • Context is not the main reason why TDP is failing • [If you evaluate based on discovered structure, then context is a lower order consideration]

New Framework • Detectors for a set of object classes • Turn down the threshold • Each detection gets a l_i variable and has a centroid rho_i • Goal is to assign l_i's to every detection in a way that uses both the “detection strength” and the context of other detections

Possible Datasets • Bedrooms • Faces • Overhead Traffic

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: