480 likes | 909 Views
Learning Spatial Context: Using stuff to find things. Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008. Things vs. Stuff. From: Forsyth et al. Finding pictures of objects in large collections of images . Object Representation in Computer Vision , 1996.
E N D
Learning Spatial Context:Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008
Things vs. Stuff From: Forsyth et al. Finding pictures of objects in large collections of images. Object Representation in Computer Vision, 1996. Thing (n): An object with a specific size and shape. Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial extent or shape.
Finding Things Context is key!
Outline • What is Context? • The Things and Stuff (TAS) model • Results
Satellite Detection Example D(W) = 0.8 D(W) = 0.8
Error Analysis Typically… True Positives areIN CONTEXT False Positives areOUT OF CONTEXT We need to look outside the bounding box!
Types of Context gist car “likely” keyboard “unlikely” • Thing-Thing: • Scene-Thing: • Stuff-Stuff: [ Torralba et al., LNCS 2005 ] [ Gould et al., IJCV 2008 ] [ Rabinovich et al., ICCV 2007 ]
Types of Context • Stuff-Thing: • Based on spatial relationships • Intuition: Road = cars here Trees = no cars “Cars drive on roads” “Cows graze on grass” “Boats sail on water” Houses = cars nearby Goal: Unsupervised
Outline • What is Context? • The Things and Stuff (TAS) model • Results
Things • Detection “candidates” • Low detector threshold -> “over-detect” • Each candidate has a detector score
Things • Candidate detections • Image Window Wi + Score • Boolean R.V. Ti • Ti = 1: Candidate is a positive detection • Thing model ImageWindowWi Ti
Stuff • Coherent image regions • Coarse “superpixels” • Feature vector Fj in Rn • Cluster label Sj in {1…C} • Stuff model • Naïve Bayes Sj Fj
Relationships • Descriptive Relations • “Near”, “Above”, “In front of”, etc. • Choose set R = {r1…rK} • Rijk=1: Detection i and region j have relation k • Relationship model T1 S72 = Trees S10 = Road S4 = Houses Sj Ti R1,10,in=1 Rijk
The TAS Model Wi: Window Ti: Object Presence Sj: Region Label Fj: Region Features Rijk: Relationship ImageWindowWi K Ti Rijk Sj N Fj J AlwaysObserved AlwaysHidden Supervisedin Training Set
Unrolled Model R1,1,left = 1 S1 T1 R2,1,above = 0 S2 R3,1,left = 1 T2 S3 R1,3,near = 0 S4 T3 R3,3,in = 1 S5 CandidateWindows ImageRegions
Learning the Parameters • Assume we know R • Sj is hidden • Everything else observed • Expectation-Maximization • “Contextual clustering” • Parameters are readily interpretable ImageWindowWi K Ti Rijk Sj N Fj J AlwaysObserved AlwaysHidden Supervisedin Training Set
Which Relationships to Use? • Rijk = spatial relationship between candidate i and region j Rij1 = candidate in region Rij2 = candidate closer than 2 bounding boxes (BBs) to region Rij3 = candidate closer than 4 BBs to region Rij4 = candidate farther than 8 BBs from region Rij5 = candidate 2BBs left of region Rij6 = candidate 2BBs right of region Rij7 = candidate 2BBs below region Rij8 = candidate more than 2 and less than 4 BBs from region … RijK = candidate near region boundary How do we avoid overfitting?
Learning the Relationships • Intuition • “Detached” Rijk = inactive relationship • Structural EM iterates: • Learn parameters • Decide which edge to toggle • Evaluate with l(T|F,W,R) • Requires inference • Better results than using standard E[l(T,S,F,W,R)] Rij1 Rij2 RijK Ti Sj Fj
Inference • Goal: • Block Gibbs Sampling • Easy to sample Ti’s given Sj’s and vice versa
Outline • What is Context? • The Things and Stuff (TAS) model • Results
Base Detector - HOG • HOG Detector: [ Dalal & Triggs, CVPR, 2006 ] Feature Vector X SVM Classifier
Results - Satellite Posterior:Detections Prior:Detector Only Posterior:Region Labels
Results - Satellite 1 0.8 ~10% improvement in recall at 40 fppi 0.6 Recall Rate 0.4 TAS Model 0.2 Base Detector 0 40 80 120 160 False Positives Per Image
PASCAL VOC Challenge • 2005 Challenge • 2232 images split into {train, val, test} • Cars, Bikes, People, and Motorbikes • 2006 Challenge • 5304 images plit into {train, test} • 12 classes, we use Cows and Sheep
Discovered Context - Bicycles Bicycles Cluster #3
TAS Results – Bicycles • Examples • Discover “true positives” • Remove “false positives” ? BIKE ? ?
Conclusions • Detectors can benefit from context • The TAS model captures an important type of context • We can improve any sliding windowdetector using TAS • The TAS model can be interpreted and matches our intuitions • We can learn which relationships to use
Object Detection • Task: Find the things • Example: Find all the cars in this image • Return a “bounding box” for each • Evaluation: • Maximize true positives • Minimize false positives
Sliding Window Detection • Consider every bounding box • All shifts • All scales • Possibly all rotations • Each such window gets a score: • D(W) • Detections: Local peaks in D(W) • Pros: • Covers the entire image • Flexible to allow variety of D(W)’s • Cons: • Brute force – can be slow • Only considers features in box D = 1.5 D = -0.3
Sliding Window Results PASCALVisual Object Classes ChallengeCows 2006 D(W) > T Recall(T) = TP / (TP + FN)Precision(T) = TP / (TP + FP) score(A,B) = |A∩B| / |AUB| A B score(A,B) > 0.5 TRUE POSITIVE score(A,B) ≤ 0.5 FALSE POSITIVE
Quantitative Evaluation 1 0.8 0.6 Recall Rate 0.4 0.2 0 40 80 120 160 False Positives Per Image
Detections in Context Task: Identify all cars in the satellite image Idea: The surrounding context adds info to the local window detector + = Houses Road Region Labels Prior:Detector Only Posterior:TAS Model
Features: Haar wavelets Haar filters and integral image Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size. BOOSTING!
Features: Edge fragments Opelt, Pinz, Zisserman, ECCV 2006 Weak detector = Match of edge chain(s) from training image to edgemap of test image BOOSTING!
Histograms of oriented gradients • SIFT, D. Lowe, ICCV 1999 • Dalal & Trigs, 2006 SVM!