130 likes | 247 Views
Semantic Contours from Inverse Detectors. Bharath Hariharan et.al. (ICCV-11). Localizing and classifying category-specific object contours in real world images. Low-level contours (No-class specific). Problem. Class specific contours.
E N D
Semantic Contours from Inverse Detectors BharathHariharan et.al. (ICCV-11)
Localizing and classifying category-specific object contours in real world images Low-level contours (No-class specific) Problem Class specific contours
Localizing and classifying category-specific object contours in real world images Naive Solution • Using detector outputs will result is contours from surrounding context • To avoid this problem they propose the inverse detector
Given localized contours I and object detector , the Inverse Detector produces the object contour image The Inverse Detector Inverse detector • I – image • G – output of contour detector • Gij – scores the likelihood of a pixel (i,j) lying on a contour • R1, ..., Rl – l activation windows of the detector • sk – score corresponding to each activation window Rk • - Feature vector for pixel (i, j)
Each detector window divided into S spatial bins • Contours are binned into O orientation bins • For a pixel (i, j), for an activation window RK, assigned into one of bins (from SO) • Feature Vector at a location (i, j), and detector RK: Feature Vector • en: an SO-dimensional vector with 1 in the nth position and 0 otherwise • index of the bin into which the pixel (i, j) falls • Feature vector for pixel (i, j): • weighted sum of across all the activation windows
Inverse detectors is of the following form: • where, learn weight vector using a linear SVM with these features Inverse detectors Inverse detector • Complete system: use of inverse detectors for localizing semantic contours • Using poselet types object detectors[1] • bottom-up contour detector[2] [1]-Detecting people using mutually consistent poselet activation. L. Bourdev et.al., ECCV-2010 [2] - Contour detection and hierarchical image segmentation. P. Arbelaez et.al, PAMI-2011
System has two stages • traininverse detectors for each poselet types • let Pposelets corresponding to category C be • combine output of these inverse detectors to produce category-specific contours • Stage 1: train inverse detectors (of the following form) for each poselet (as discussed previously) Localizing semantic contours using inverse detectors • Stage 2: combining the outputs of each of these inverse detectors • Train a linear SVM (with classifying each pixel belonging to object contour or not) • Features: concatenate the outputs of the inverse detectors corresponding to each of the poselet type
Previous model: considers each category independently. • In this model: combine information from across categories • Propose two methods Method 1 • First level: Train contour detector for each category separately • Second level: Train on the outputs of these contour detectors Combining information across categories • Feature vector at the second level: Method 2 • Only One level: Train on the features which are the outputs of the inverse detectors corresponding to the poselets of all categories • Feature vector this level:
8498 training images and 2820 test images (both instance specific and class specific) Semantic Boundaries Dataset (SBD)
Show precision-recall curve for a detector producing soft output, parameterized by the detection score • Report two summary statistics: • Average precision (AP) • maximal F-measure (MF) = (F = 2PR/(P+R) • Precision: fraction of true contours among detections • Recall: fraction of ground-truth contours detected Benchmark precision and recall are practically zero
8498 training images and 2820 test images • Baseline comparison with the low level contour generated by contour detector[1] • Improve both MF and AP by a factor of 5 wrt to the bottom up contour detector • Single stage contour detector that combines the outputs of all inverse detectors across all categories does better than two stage detector. Experiments • Best performance: transportation means (aeroplane, bicycle, bus, car, motorbike, train), people, bottles, TV monitors • Worst: chairs, dining tables, potted plants, boats and birds (hard to detect) [1] - Contour detection and hierarchical image segmentation. P. Arbelaez et.al, PAMI-2011