1 / 35

Learning Spatial Context: Can stuff help us find things ?

This study explores how considering surrounding context aids in locating objects through detection models. From sliding window techniques to relationship models, the TAS model offers insights for enhancing detection accuracy.

terrymorris
Download Presentation

Learning Spatial Context: Can stuff help us find things ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Spatial Context:Can stuff help us find things? Geremy Heitz Daphne Koller April 14, 2008 DAGS • Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial extent or shape. • Thing (n): An object with a specific size and shape.

  2. Outline • Sliding window object detection • What is context? • The Things and Stuff (TAS) model • Results

  3. Object Detection • Task: Find all the cars in this image. • Return a “bounding box” for each • Evaluation: • Maximize true positives • Minimize false positives • Precision-Recall tradeoff

  4. Sliding Window Detection • Consider every bounding box • All shifts • All scales • Possibly all rotations • Each box gets a score: D(x,y,s,Θ) • Detections: Local peaks in D() • Pros: • Covers the entire image • Flexible to allow variety of D()’s • Cons: • Brute force – can be slow • Only considers features in box D = 1.5 D = -0.3

  5. Features: Haar wavelets Haar filters and integral image Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size. BOOSTING!

  6. Features: Edge fragments Opelt, Pinz, Zisserman, ECCV 2006 Weak detector = Match of edge chain(s) from training image to edgemap of test image BOOSTING!

  7. Histograms of oriented gradients • SIFT, D. Lowe, ICCV 1999 • Dalal & Trigs, 2006 SVM!

  8. Sliding Window Results PASCALVisual Object Classes ChallengeCows 2006 Recall = TP / (TP + FN)Precision = TP / (TP + FP) A B score(A,B) = |A∩B| / |AUB|True Pos: B s.t. score(A,B) > 0.5 for some AFalse Pos: B s.t. score(A,B) < 0.5 for all A False Neg: A s.t. score(A,B) < 0.5 for all B

  9. Satellite Detection Example

  10. Quantitative Evaluation 1 0.8 0.6 Recall Rate 0.4 0.2 0 40 80 120 160 False Positives Per Image

  11. Why does this suck? True Positives in Context False Positives in Context False Positives out of Context Context!

  12. What is Context?

  13. What is Context? gist car “likely” keyboard “unlikely” • Thing-Thing: • Scene-Thing: • Stuff-Stuff: Torralba et al., 2005 Gouldet al., 2008 Rabinovich et al., 2007

  14. What is Context? • Stuff-Thing: • Based on intuitive “relationships” Road = cars here Trees = no cars Houses = cars nearby

  15. Things • Candidate detections • Bounding Box + Score • Boolean R.V. Ti • Ti = 1: Candidate is a positive detection • Thing-only model ImageWindowWi Ti

  16. Stuff • Coherent image regions • Coarse “superpixels” • Feature vector Fj in Rn • Cluster label Sj • Stuff-only model • Naïve Bayes Sj Fj

  17. Relationships S72 = Trees S10 = Road S4 = Houses • Descriptive Relations • “Near”, “Above”, “In front of”, etc. • Choose a set R • Rij: Relation between detection i and region j • Relationship model Sj Ti Rij

  18. The TAS Model Wi: Window Ti: Object Presence Sj: Region Label Fj: Region Features Rij: Relationship ImageWindowWi Ti Rij Sj N Fj J

  19. Unrolled Model R11 = “Left” S1 T1 R21 = “Above” S2 R31 = “Left” T2 S3 R13 = “In” S4 T3 R33 = “In” S5 CandidateWindows ImageRegions

  20. Learning • Everything observed except Sj’s • Expectation-Maximization • Mostly discrete variables • Like Mixture-of-Gaussians • An ode to directed models: Oh directed probabilistic models You are so beautiful and palatable Because unlike your undirected friends Your parameters are so very interpretable - Unknown Russian Mathematician (Translated by Geremy Heitz)

  21. Learned Satellite Clusters

  22. Inference • Goal: • Gibbs Sampling • Easy to sample Ti’s given Sj’s and vice versa • Could do distributional particles

  23. Results - Satellite Posterior:TAS Model Prior:Detector Only Region Labels

  24. Results - Satellite

  25. PASCAL VOC Challenge • 2005 Challenge • 2232 images split into {train, val, test} • Cars, Bikes, People, and Motorbikes • 2006 • 5304 images plit into {train, test} • 12 classes, we use Cows and Sheep • Results reported for challenge with state-of-the-art approaches • Caveat: They didn’t get to see the test set before the challenge, but I did!

  26. Results – PASCAL Cows

  27. Results – PASCAL Bicycles Cluster #3

  28. Results – PASCAL • Good examples • Discover “true positives” • Remove “false positives”

  29. Results – VOC 2005 Car Motorbike Bicycle People

  30. Results – VOC 2006 Sheep Cow

  31. Conclusions • Detectors can benefit from context • The TAS model captures an important type of context • We can improve any sliding windowdetector using TAS • The TAS model can be interpreted and matches our intuitions • Geremy is smart

  32. Detections in Context Task: Identify all cars in the satellite image Idea: The surrounding context adds info to the local window detector + = Houses Road Region Labels Prior:Detector Only Posterior:TAS Model

  33. Equations

More Related