1.35k likes | 1.59k Views
Training Discriminative Computer Vision Models with Weak Supervision. Boris Babenko PhD Defense University of California, San Diego. Outline. Overview Supervised Learning Weakly Supervised Learning Weakly Labeled Location Object Localization and Recognition Object Detection with Parts
E N D
Training Discriminative Computer Vision Models with Weak Supervision Boris Babenko PhD Defense University of California, San Diego
Outline • Overview • Supervised Learning • Weakly Supervised Learning • Weakly Labeled Location • Object Localization and Recognition • Object Detection with Parts • Object Tracking • Weakly Labeled Categories • Object Detection with Sub-categories • Object Recognition with Super-categories • Theoretical Analysis of Multiple Instance Learning • Conclusions & Future Work
Outline • Overview • Supervised Learning • Weakly Supervised Learning • Weakly Labeled Location • Object Localization and Recognition • Object Detection with Parts • Object Tracking • Weakly Labeled Categories • Object Detection with Sub-categories • Object Recognition with Super-categories • Theoretical Analysis of Multiple Instance Learning • Conclusions & Future Work
Computer Vision Problems • Want to detect, recognize/classify, track objects in images and videos • Examples: • Face detection for point-and-shoot cameras • Pedestrian detection for cars • Animal tracking for behavioral science • Landmark/place recognition for search-by-image
Old School • Hand tuned models per application • Example: face detection [Yang et al. ‘94]
New School • Adopt methods from machine learning • Train a generic* system by providing labeled examples (supervised learning) • Labeling examples is intuitive • Adapt to new domains/applications • Learn subtle queues that would be impossible to model by hand * Hand tuning/design still often required :-/
Supervised Learning • Training data: pairs of inputs and labels • Train classifier to predict label for novel input TRAINING RUN TIME ( ,non-face) ( ) ( , face) ( , face) ( ,non-face)
Supervised Learning • Training data: • Most common case: • Want to train a classifier: • Typically a classifier also outputs a confidence score, in addition to label Inputs/instances: Labels:
Discriminative vs Generative • Generative: model the distribution of the data • Discriminative: directly minimize classification error, model the boundary • E.g. SVM, AdaBoost, Perceptron • Tends to outperform generative models
Training Discriminative Model • Objective (minimize training error) • Loss function, , is typically a convex upper bound on 0/1 loss • Regularization term can help avoid over-fitting
Weak Supervision • Slightly overloaded term… • Any form of learning where the training data is missing some labels (i.e. latent variables)
Object Detection w/ Weak Supervision • Goal: train object detector • Strong: • Weak: only presence of object is known, not location + ( , face) + ( , face) ( ,non-face) -
Object Detection w/ Weak Supervision • Goal: train object detector • Strong: • Weak: only presence of object is known, not location <- latent + ( , face) + ( , face) ( ,non-face) -
Weak Supervision: Advantages • Reduce labor cost • Deal with inherent ambiguity & human error • Automatically discover latent information
Training w/ Latent Variables • Classifier now takes in input AND latent input • To predict label: • Objective:
Training w/ Latent Variables • Classifier now takes in input AND latent input • To predict label: • Objective: • Not convex!
Training w/ Latent Variables • Two ways of solving • Method 1: Alternate between finding latent variables and training classifier • Finding latent variables given a fixed classifier may require domain knowledge • E.g. EM (Dempster et al.), Latent Structural SVM (Yu & Joachims) – based on CCCP (Yuille & Rangarajan), Latent SVM (Felzenszwalb et al.), MI-SVM (Andrews et al.)
Training w/ Latent Variables • Method 2: Replace the hard max with “soft” approximation, and then do gradient descent • E.g. MILBoost (Viola et al.), MIL-Logistic Regression (Ray et al.)
Outline • Overview • Supervised Learning • Weakly Supervised Learning • Weakly Labeled Location • Object Detection, Localization and Recognition • Object Detection with Parts • Object Tracking • Weakly Labeled Categories • Object Detection with Sub-categories • Object Recognition with Super-categories • Theoretical Analysis of Multiple Instance Learning • Conclusions & Future Work
Object Detection w/ Weak Supervision • Goal: train object detector • Only presence of object is known, not location • Can’t “just throw these into a learning alg.” – very difficult to design invariant features + + -
Multiple Instance Learning (MIL) • (set of inputs, label) pairs provided • MIL lingo: set of inputs = bag of instances • Learner does not see instance labels • Bag labeled positive if at least one instance in bag is positive [Keeler et al. ‘90, Dietterich et al. ‘97]
Object Detection w/ MIL { … } + Instance: image patch Instance Label: is face? Bag: whole image Bag Label: contains face? { … } + { … } - [Andrews et al. ’02, Viola et al. ’05, Dollar et al. 08, Galleguillos et al. 08]
MIL Notation • Training input: Bags: Bag Labels: Instance Labels: (unknown during training)
MIL • Positive bag contains at least one positive instance • Goal: learning instance classifier • Corresponding bag classifier
MIL Algorithms • Many “standard” learning algorithms have been adapted to the MIL scenario: • SVM (Andrews et al. ‘02), Boosting (Viola et al. ‘05), Logistic Regression (Ray et al. ‘05) • Some specialized algorithms also exist • DD (Maron et al. ’98), EM-DD (Zhang et al. ‘02)
MIL Algorithms • Objective: minimize bag error on training data • MILBoost (Viola et al. ‘05) • Replace max with differentiable approximation • Use functional gradient descent (Mason et al. ’00, Friedman ’01) Bag label according to , i.e.
Object Detection • Have a learning framework (MIL), and an algorithm to train classifier (MILBoost) • Question: how exactly do we form a bag? { …} Segmentation { …} Sliding Window
Forming a bag via segmentation • Pro: get more precise localization • Con: segmentation algorithms often fail; require prior knowledge (e.g. number of segments) • If segmentation fails, we might not see “the” positive instance in a positive bag • Only way to prevent this is to use ALL possible segments… not practical
Multiple Stable Segmentations (MSS) • Solution: Multiple Stable Segmentations (Rabinovich et al. ‘06) • A heuristic for picking out a few “good” segments from the huge set of all possible segments • End up with more segments, but higher chance of getting the “right” segment
{ …} Multiple Instance Learning with Stable Segmentation (MILSS) Multiple Stable Segmentation BOF BOF BOF BOF • Localization and Recognition • Features: BOF on SIFT • Classifier: MILBoost one-vs-all (for multiclass) [ Work with Carolina Galleguillos, Andrew Rabinovich & Serge Belongie – ECCV ‘08]
More segments = better results Our System NCuts w/ k=6 NCuts w/ k=4
Outline • Overview • Supervised Learning • Weakly Supervised Learning • Weakly Labeled Location • Object Localization and Recognition • Object Detection with Parts • Object Tracking • Weakly Labeled Categories • Object Detection with Sub-categories • Object Recognition with Super-categories • Theoretical Analysis of Multiple Instance Learning • Conclusions & Future Work
Object Detection with Parts • Pedestrians are non-rigid • Difficult to design features that are invariant • Decision boundary very complex • Objects parts are rigid
Object Detection with Parts • Naïve sol’n: label parts and train detectors • Labor intensive • Sub-optimal (e.g. “space between the legs”) • Better sol’n: • Use rough location of objects • Treat part locations as latent variables [Mohan et al. ’01, Mikolajczyk et al. ‘04]
Multiple Component Learning (MCL) • How to train a part detector from weakly labeled data? • How to train many, diverse part detectors • How to combine part detectors and incorporate spatial information [Work with Piotr Dollar, PietroPerona, ZhuowenTu & Serge Belongie ECCV ‘08]
MCL: One Part Detector • Fits perfectly into MIL • Which part does it learn? { … } + { … } +
MCL: Diverse Parts • Pedestrian images are “roughly aligned” • Choose random sections of the images to feed into MIL
MCL: Combining Part Detectors • Run part detectors, get response map • Compute Haar features on top, plug into Boosting Confidence maps from each part detector
MCL: Results • INRIA Pedestrian dataset
MCL: Related Work • P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan. "Object Detection with Discriminatively Trained Part-Based Models" IEEE PAMI. Sept 2009. • Very similar model, uses SVM instead of Boosting, and an explicit shape model • L. Bourdev, S. Maji, T. Brox, J. Malik. “Detecting people using mutually consistent poselet activations” ECCV 2010.
Outline • Overview • Supervised Learning • Weakly Supervised Learning • Weakly Labeled Location • Object Localization and Recognition • Object Detection with Parts • Object Tracking • Weakly Labeled Categories • Object Detection with Sub-categories • Object Recognition with Super-categories • Theoretical Analysis of Multiple Instance Learning • Conclusions & Future Work
Object Tracking • Problem: given location of object in first frame, track object through video • Tracking by Detection: alternate training detector and running it on each frame
Tracking by Detection • First frame is labeled
Tracking by Detection • First frame is labeled Online classifier (i.e. Online AdaBoost) Classifier
Tracking by Detection • Grab one positive patch, and some negative patch, and train/update the model. negative positive Classifier