180 likes | 384 Views
Outline . Scene Understanding Motivation for Context Fixed order model (and constellation) Known bag of objects WITH context Independent TDP Unknown bag of objects WITHOUT context CASPER Distribution Preliminary Experiments Experimental Plan. Scene Understanding - JL.
E N D
Outline • Scene Understanding • Motivation for Context • Fixed order model (and constellation) • Known bag of objects WITH context • Independent TDP • Unknown bag of objects WITHOUT context • CASPER Distribution • Preliminary Experiments • Experimental Plan
Scene Understanding - JL • Localize and classify all objects in an image • Not necessarily “segment” at the pixel level • A “description” of a scene is a pair of vectors l and rho where l gives the class for each instance and rho gives the location • We are trying to find P(l,rho)
Why Context? • Represent “context” • Show LOOPS picture • How do you use context? • (l,rho) version
Fixed Order Model • Known set of objects • Joint Gaussian over all centroids • P(l,rho) = 1{l == l_fixed_order} P(rho | l) • Problems: • We don't always know the exact set of objects • Facebook example • What if there are two instances from the same class? • Bedroom scene example
TDP • Unknown set of objects • Gaussian over centroid for each • Centroids are independent • P(l,rho) = P(l) prod_i P(rho_i | l_i) • Problems: • This doesn't take pairwise constraints into account • We have lost context
CASPER • Unknown set of objects • Joint Gaussian over centroids given ANY set • P(l,rho) = P(l) P(rho | l) • Questions: • How do we represent P(l)? • How do we represent P(rho | l)? • How do we learn • How do we infer
P(l) • Options: • Dirichlet Process • IID Multinomial • Other smart things
P(rho | l) - GH • Desiderata: • Correlations between rho's • Sharing of parameters between l's • ... • Options: • Independent • Learn a different Gaussian for every l • Can't share parameters, large number of l's • Gaussian Process • Correlation is the not natural space to represent these relationships • Product of Experts • Each “expert” represents a Gaussian offset between objects • This is where we have spent the most time
CASPER P(rho|l) - JL • Some math and examples: • P(rho,d|l) = 1/Z prod_ij P_c(rho_i-rho_j)^dijc • P(d|l) = Multinomial • P(rho|d,l) = Gaussian • Precision space view
Learning the Experts • Training set with (l,rho) pairs per image • Gibbs over the hidden variables: • Graph for the image (d's) • Loop over edges
Generative Process • Show some synthetic generated images
Preliminary Experiments - GH • Bedroom and Streets Scenes from LabelMe: • Features SIFT • Features (x,w)
Learning/Inference in Full Model • Three stage Gibbs: • Features to Instances • Black box: previous algorithm • Graph for the image (d's) • Loop over edges • Instances to Classes • Training • Supervise Feature to instance and instance to class assignments • Testing • Introduce new images and gibbs away
Results • Show Sucky Pictures
Problems • It's hard to evaluate • If your goal is detection you lose to discriminative approaches • Context is not the main reason why TDP is failing • [If you evaluate based on discovered structure, then context is a lower order consideration]
New Framework • Detectors for a set of object classes • Turn down the threshold • Each detection gets a l_i variable and has a centroid rho_i • Goal is to assign l_i's to every detection in a way that uses both the “detection strength” and the context of other detections
Possible Datasets • Bedrooms • Faces • Overhead Traffic