840 likes | 1.09k Views
Pascal Grand Challenge. Felix Vilensky 19/6/2011. Outline. Pascal VOC c hallenge framework. Successful detection methods Object Detection with Discriminatively Trained Part Based Models (P.F.Felzenszwalb et al.)-”UoC/TTI” Method. Multiple Kernels for Object Detection (A.Vedaldi et al.)-
E N D
Pascal Grand Challenge Felix Vilensky 19/6/2011
Outline • Pascal VOC challenge framework. • Successful detection methods • Object Detection with Discriminatively Trained Part Based Models (P.F.Felzenszwalb et al.)-”UoC/TTI” Method. • Multiple Kernels for Object Detection (A.Vedaldi et al.)- ”Oxford\MSR India” method. • A successful classification method • Image Classification using Super-Vector Coding of Local Image Descriptors (Xi Zhou et al)-NEC/UIUC Method. • Discussion about bias in datasets. • 2010 Winners Overview.
Pascal VOC Challenge Framework The PASCAL Visual Object Classes (VOC) Challenge Mark Everingham · Luc Van Gool · Christopher K. I. Williams · John Winn · Andrew Zisserman
Pascal VOC Challenge • Classification Task. • Detection Task. • Pixel-level segmentation. • “Person Layout” detection. • Action Classification in still images.
Classification Task 100% At least one bus
Detection Task 100% Predicted bounding box should overlap by at least 50% with ground truth!!!
Detections “near misses” Didn’t fulfill the BB overlap criterion
Pascal VOC Challenge-The Object Classes Images retrieved from flicker website.
Pixel Level Segmentation Object segmentation Class segmentation Image
Action Classification • Classification among 9 action classes. 100% Speaking on the phone 100% Playing the guitar
Annotation • Class. • Bounding Box. • Viewpoint. • Truncation. • Difficult (for classification\detection).
Evaluation A way to compare between different methods. • Precision\Recall Curves. • Interpolated Precision. • AP(Average Precision)
Evaluation-Precision\Recall Curves(1) • Practical Tradeoff between precision and recall. • Interpolated Precision-
Evaluation-Average Precision(AP) AP is for determining who’s the best.
UoC/TTI Method Overview (P.Felzenszwalb et al.) • Joint winner in 2009 Pascal VOC challenge with the Oxford Method. • Award of "lifetime achievement“ in 2010. • Mixture of deformable part models. • Each component has global template + deformable parts • HOG feature templates. • Fully trained from bounding boxes alone.
UoC/TTI Method – HOG Features(1) • [-1 0 1] and its transpose Gradient. • Gradient orientation is discretized into one of p values. • Pixel-level features Cells of size k. • 8-pixel cells(k=8). • 9 bins contrast sensitive +18 bins contrast insensitive =total 27 bins! Soft binning
UoC/TTI Method – HOG Features(3) • Normalization. • Truncation. • 27 bins X 4 normalization factors= 4X27 matrix. • Dimensionality Reduction to 31.
UoC/TTI Method – Deformable Part Models • Coarse root. • High-Resolution deformable parts. • Part - (Anchor position, deformation cost, Res. Level)
UoC/TTI Method – Mixture Models(1) • Diversity of a rich object category. • Different views of the same object. • A mixture of deformable part models for each class. • Each deformable part model in the mixture is called a component.
UoC/TTI Method – Object Hypothesis Slide taken from the methods presentation
UoC/TTI Method –Models(1) 6 component person model
UoC/TTI Method –Models(2) 6 component bicycle model
UoC/TTI Method – Score of a Hypothesis Slide taken from method's presentation
UoC/TTI Method – Matching(1) • “Sliding window approach” . • High scoring root locations define detections. Best part location Root location • Matching is done for each component separately.
UoC/TTI Method – Post Processing & Context Rescoring Slide taken from method's presentation
UoC/TTI Method – Training & DM • Weakly labeled data in Training set. • Latent SVM(LSVM) trainingwith as latent value. • Training and Data mining in 4 stages: Optimize z Add hard negative examples Optimize β Remove easy negative examples
Oxford Method Overview (A.Vedaldi et al.) Regions with different scales and aspect ratios 6 feature channels 3 level spatial pyramid Cascade :3 SVM classifiers with 3 different kernels Post Processing
Oxford Method – Feature Channels • Bag of Visual Words-SIFT descriptors are extracted and quantized in a vocabulary of 64 words. • Dense words (PhowGray, PhowColor)- Another set of SIFT Descriptors are then quantized in 300 visual words. • Histogram of oriented edges (Phog180, Phog360)-Similar to the HOG descriptor used by the ”UoC/TTI” Method with 8 orientation bins. • Self-similarity features (SSIM).
Oxford Method – Feature Vector Chart is taken from the methods presentation
Oxford Method – Discriminant Function(2) • The kernel of the discriminant function is a linear combination of histogram kernels: • The parameters and the weights (total 18)are learned using MKL(Multiple Kernel Learning). • The discriminant function is used to rank candidate regions R by the likelihood of containing an instance of the object of interest.
Oxford Method – Cascade Solution(1) • Exhaustive search of the best candidate regions R , requires a number of operations which is O(MBN): • N – The number of regions. • M – The number of support vectors in . • B – The dimensionality of the histograms. • To reduce this complexity a cascade solution is applied. • The first stage uses a “cheap” linear kernel to evaluate . • The second uses a more expensive and powerful quasi-linear kernel. • The Third uses the most powerful non-linear kernel. • Each stage evaluates the discriminant function on a smaller number of candidate regions.
Oxford Method – Cascade Solution(2) Stage 1- Linear Stage 2- Quasi-linear Stage 3- Non linear
Oxford Method – Cascade Solution(3) Chart is taken from the methods presentation
Oxford Method – The Kernels • All the before mentioned kernels are of the following form: • For Linear kernels both f and g are linear. For quasi-linear kernels only f is linear.
Oxford Method – Post-Processing • The output of the last stage is a ranked list of 100 candidate regions per image. • Many of these regions correspond to multiple detections. • Non- Maxima Suppression is used. • Max 10 regions per image remain.
Oxford Method – Training/Retraining(1) • Jittered\flipped instances are used as positive samples. • Training images are partitioned into two subsets. • The classifiers are tested on each subset in turn adding new hard negative samples for retraining.
Oxford Method – Results(3) Training and testing on VOC2009. Training and testing on VOC2007. Training and testing on VOC2008. Training on VOC2008 and testing on VOC2007.