1 / 42

Learning Spatial Context: Using stuff to find things

Learning Spatial Context: Using stuff to find things. Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008. Things vs. Stuff. From: Forsyth et al. Finding pictures of objects in large collections of images . Object Representation in Computer Vision , 1996.

hedia
Download Presentation

Learning Spatial Context: Using stuff to find things

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Spatial Context:Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008

  2. Things vs. Stuff From: Forsyth et al. Finding pictures of objects in large collections of images. Object Representation in Computer Vision, 1996. Thing (n): An object with a specific size and shape. Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial extent or shape.

  3. Finding Things Context is key!

  4. Outline • What is Context? • The Things and Stuff (TAS) model • Results

  5. Satellite Detection Example D(W) = 0.8 D(W) = 0.8

  6. Error Analysis Typically… True Positives areIN CONTEXT False Positives areOUT OF CONTEXT We need to look outside the bounding box!

  7. Types of Context gist car “likely” keyboard “unlikely” • Thing-Thing: • Scene-Thing: • Stuff-Stuff: [ Torralba et al., LNCS 2005 ] [ Gould et al., IJCV 2008 ] [ Rabinovich et al., ICCV 2007 ]

  8. Types of Context • Stuff-Thing: • Based on spatial relationships • Intuition: Road = cars here Trees = no cars “Cars drive on roads” “Cows graze on grass” “Boats sail on water” Houses = cars nearby Goal: Unsupervised

  9. Outline • What is Context? • The Things and Stuff (TAS) model • Results

  10. Things • Detection “candidates” • Low detector threshold -> “over-detect” • Each candidate has a detector score

  11. Things • Candidate detections • Image Window Wi + Score • Boolean R.V. Ti • Ti = 1: Candidate is a positive detection • Thing model ImageWindowWi Ti

  12. Stuff • Coherent image regions • Coarse “superpixels” • Feature vector Fj in Rn • Cluster label Sj in {1…C} • Stuff model • Naïve Bayes Sj Fj

  13. Relationships • Descriptive Relations • “Near”, “Above”, “In front of”, etc. • Choose set R = {r1…rK} • Rijk=1: Detection i and region j have relation k • Relationship model T1 S72 = Trees S10 = Road S4 = Houses Sj Ti R1,10,in=1 Rijk

  14. The TAS Model Wi: Window Ti: Object Presence Sj: Region Label Fj: Region Features Rijk: Relationship ImageWindowWi K Ti Rijk Sj N Fj J AlwaysObserved AlwaysHidden Supervisedin Training Set

  15. Unrolled Model R1,1,left = 1 S1 T1 R2,1,above = 0 S2 R3,1,left = 1 T2 S3 R1,3,near = 0 S4 T3 R3,3,in = 1 S5 CandidateWindows ImageRegions

  16. Learning the Parameters • Assume we know R • Sj is hidden • Everything else observed • Expectation-Maximization • “Contextual clustering” • Parameters are readily interpretable ImageWindowWi K Ti Rijk Sj N Fj J AlwaysObserved AlwaysHidden Supervisedin Training Set

  17. Learned Satellite Clusters

  18. Which Relationships to Use? • Rijk = spatial relationship between candidate i and region j Rij1 = candidate in region Rij2 = candidate closer than 2 bounding boxes (BBs) to region Rij3 = candidate closer than 4 BBs to region Rij4 = candidate farther than 8 BBs from region Rij5 = candidate 2BBs left of region Rij6 = candidate 2BBs right of region Rij7 = candidate 2BBs below region Rij8 = candidate more than 2 and less than 4 BBs from region … RijK = candidate near region boundary How do we avoid overfitting?

  19. Learning the Relationships • Intuition • “Detached” Rijk = inactive relationship • Structural EM iterates: • Learn parameters • Decide which edge to toggle • Evaluate with l(T|F,W,R) • Requires inference • Better results than using standard E[l(T,S,F,W,R)] Rij1 Rij2 RijK Ti Sj Fj

  20. Inference • Goal: • Block Gibbs Sampling • Easy to sample Ti’s given Sj’s and vice versa

  21. Outline • What is Context? • The Things and Stuff (TAS) model • Results

  22. Base Detector - HOG • HOG Detector: [ Dalal & Triggs, CVPR, 2006 ] Feature Vector X SVM Classifier

  23. Results - Satellite Posterior:Detections Prior:Detector Only Posterior:Region Labels

  24. Results - Satellite 1 0.8 ~10% improvement in recall at 40 fppi 0.6 Recall Rate 0.4 TAS Model 0.2 Base Detector 0 40 80 120 160 False Positives Per Image

  25. PASCAL VOC Challenge • 2005 Challenge • 2232 images split into {train, val, test} • Cars, Bikes, People, and Motorbikes • 2006 Challenge • 5304 images plit into {train, test} • 12 classes, we use Cows and Sheep

  26. Base Detector Error Analysis Cows

  27. Discovered Context - Bicycles Bicycles Cluster #3

  28. TAS Results – Bicycles • Examples • Discover “true positives” • Remove “false positives” ? BIKE ? ?

  29. Results – VOC 2005

  30. Results – VOC 2006

  31. Conclusions • Detectors can benefit from context • The TAS model captures an important type of context • We can improve any sliding windowdetector using TAS • The TAS model can be interpreted and matches our intuitions • We can learn which relationships to use

  32. Merci!

  33. Object Detection • Task: Find the things • Example: Find all the cars in this image • Return a “bounding box” for each • Evaluation: • Maximize true positives • Minimize false positives

  34. Sliding Window Detection • Consider every bounding box • All shifts • All scales • Possibly all rotations • Each such window gets a score: • D(W) • Detections: Local peaks in D(W) • Pros: • Covers the entire image • Flexible to allow variety of D(W)’s • Cons: • Brute force – can be slow • Only considers features in box D = 1.5 D = -0.3

  35. Sliding Window Results PASCALVisual Object Classes ChallengeCows 2006 D(W) > T Recall(T) = TP / (TP + FN)Precision(T) = TP / (TP + FP) score(A,B) = |A∩B| / |AUB| A B score(A,B) > 0.5 TRUE POSITIVE score(A,B) ≤ 0.5 FALSE POSITIVE

  36. Quantitative Evaluation 1 0.8 0.6 Recall Rate 0.4 0.2 0 40 80 120 160 False Positives Per Image

  37. Detections in Context Task: Identify all cars in the satellite image Idea: The surrounding context adds info to the local window detector + = Houses Road Region Labels Prior:Detector Only Posterior:TAS Model

  38. Equations

  39. Features: Haar wavelets Haar filters and integral image Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size. BOOSTING!

  40. Features: Edge fragments Opelt, Pinz, Zisserman, ECCV 2006 Weak detector = Match of edge chain(s) from training image to edgemap of test image BOOSTING!

  41. Histograms of oriented gradients • SIFT, D. Lowe, ICCV 1999 • Dalal & Trigs, 2006 SVM!

More Related