Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection

Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection presented by Minh-Tri PhamPh.D. Candidate and Research AssociateNanyang Technological University, Singapore

Outline • Motivation • Contributions • Automatic Selection of Asymmetric Goal • Fast Weak Classifier Learning • Online Asymmetric Boosting • Generalization Bounds on the Asymmetric Error • Future Work • Summary

Problem

Application

Application Face recognition

Application 3D face reconstruction

Application Camera auto-focusing

Application • Windows face logon • Lenovo Veriface Technology

1 0 Appearance-based Approach • Scan image with probe window patch (x,y,s) • at different positions and scales • Binary classify each patch into • face, or • non-face • Desired output state: • (x,y,s) containing face • Most popular approach • Viola-Jones ‘01-’04, Li et.al. ‘02, Wu et.al. ’04, Brubaker et.al. ‘04, Liu et.al. ’04, Xiao et.al ‘04, • Bourdev-Brandt ‘05, Mita et.al. ‘05, Huang et.al. ’05 – ‘07, Wu et.al. ‘05, Grabner et.al. ’05-’07, • And many more

1 0 Appearance-based Approach • Statistics: • 6,950,440 patches in a 320x240 image • P(face) < 10-5 • Key requirement: • A very fast classifier

A very fast classifier • Cascade of non-face rejectors: pass pass pass pass F1 F2 FN face …. reject reject reject non-face • A very fast classifier

A very fast classifier • Cascade of non-face rejectors: • F1, F2, …, FN : asymmetric classifiers • FRR(Fk)  0 • FAR(Fk) as small as possible (e.g. 0.5 – 0.8) pass pass pass pass F1 F1 F1 F1 F2 F2 F2 F2 FN FN FN FN F1 F1 F1 F1 F1 F2 F2 F2 F2 F2 FN face face face face face …. reject reject reject non-face non-face non-face non-face non-face

A very fast classifier • Cascade of non-face rejectors: • F1, F2, …, FN : asymmetric classifiers • FRR(Fk)  0 • FAR(Fk) as small as possible (e.g. 0.5 – 0.8) pass pass pass pass F1 F2 FN face …. reject reject reject non-face

Non-face Rejector • A strong combination of weak classifiers: F1 yes + + + f1,K pass …. f1,1 f1,2 no reject • f1,1, f1,2, …, f1,K : weak classifiers •  : threshold >  ?

Boosting Wrongly classified Weak Classifier Learner 1 Weak Classifier Learner 2 Correctly classified Wrongly classified Correctly classified Stage 1 Stage 2 : negative example : positive example

Asymmetric Boosting • Weight positives times more than negatives Weak Classifier Learner 1 Weak Classifier Learner 2 Stage 1 Stage 2 : negative example : positive example

Non-face Rejector • A strong combination of weak classifiers: F1 yes + + + f1,K pass …. f1,1 f1,2 no reject • f1,1, f1,2, …, f1,K : weak classifiers •  : threshold >  ?

Weak classifier • Classify a Haar-like feature value Classify v input patch feature value v score

Weak classifier • Classifya Haar-like feature value Classify v input patch feature value v score …

Main issues • Requires too much intervention from experts

A very fast classifier • Cascade of non-face rejectors: • F1, F2, …, FN : asymmetric classifiers • FRR(Fk)  0 • FAR(Fk) as small as possible (e.g. 0.5 – 0.8) pass pass pass pass F1 F2 FN face …. reject reject reject non-face How to choose bounds for FRR(Fk) and FAR(Fk)?

Asymmetric Boosting How to choose ? • Weight positives times more than negatives Weak Classifier Learner 1 Weak Classifier Learner 2 Stage 1 Stage 2 : negative example : positive example

Non-face Rejector • A strong combination of weak classifiers: F1 yes + + + f1,K pass …. f1,1 f1,2 no reject • f1,1, f1,2, …, f1,K : weak classifiers •  : threshold >  ? How to choose ?

Main issues • Requires too much intervention from experts • Very long learning time

Weak classifier • Classifya Haar-like feature value Classify v input patch feature value v score 10 minutes to learn a weak classifier …

Main issues • Requires too much intervention from experts • Very long learning time • To learn a face detector ( 4000 weak classifiers): • 4,000 * 10 minutes  1 month • Only suitable for objects with small shape variance

Detection with Multi-exit Asymmetric Boosting • CVPR’08 poster paper: • Minh-Tri Pham and Viet-Dung D. Hoang and Tat-Jen Cham. Detection with Multi-exit Asymmetric Boosting. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, Alaska, 2008. • Won Travel Grant Award

Problem overview pass pass pass pass • Common appearance-based approach: • F1, F2, …, FN : boosted classifiers • f1,1, f1,2, …, f1,K : weak classifiers •  : threshold F1 F2 FN object …. reject reject reject non-object F1 yes + + + >  ? f1,K pass …. f1,1 f1,2 no reject

Objective • Find f1,1, f1,2, …, f1,K, and  such that: • K is minimized  proportional to F1’s evaluation time F1 yes + + + >  ? f1,K pass …. f1,1 f1,2 no reject

Existing trends (1) Idea • For k from 1 until convergence: • Let • Learn new weak classifier f1,k(x): • Let • Adjust  to see if we can achieve FAR(F1) <= 0 and FRR(F1) <= 0: • Break loop if such  exists Issues • Weak classifiers are sub-optimalw.r.t. training goal. • Too many weak classifiers are required in practice.

Existing trends (2) Idea • For k from 1 until convergence: • Let • Learn new weak classifier f1,k(x): • Break loop if FAR(F1) <= 0 and FRR(F1) <= 0 Pros • Reduce FRR at the cost of increasing FAR – acceptable for cascades • Fewer weak classifiers Cons • How to choose ? • Much longer training time Solution to con • Trial and error: • choose such that K is minimized.

Our solution Why? Learn every weak classifier using the same asymmetric goal: where

Because… FAR FAR (1) 1 1 (2) • Consider two desired bounds (or targets) for learning a boosted classifier • Exact bound: and • Conservative bound: • (2) is more conservative than (1) because (2) => (1). H1  = 0/0  = 1 H2 H3 H1 H4 H2 exact bound conservative bound exact bound conservative bound Q2 H39 0 0 H3 H40 Q1 Q1 Q3 Q4 Q2 Q3 H200 H201 H41 Q39 Q200 Q40 At for every new weak classifier learned, the ROC operating point moves the fastest toward the conservative bound Q201 Q41 0 b0 b0 1 FRR 0 1 FRR

Implication • When the ROC operation point lies in the conservative bound: • Conditions met, therefore  = 0. F1 yes + + + >  ? f1,K pass …. f1,1 f1,2 no reject

Multi-exit Boosting A method to train a single boosted classifier with multiple exit nodes: : a weak classifier: a weak classifier followed by a decision to continue or reject – an exit node + + + + + + + f1 f2 f3 f4 f5 f6 f7 f8 object pass pass pass F2 F3 F1 reject reject reject non-obj fi fi • Features: • Weak classifiers are trained with the same goal: • Every pass/reject decision is guaranteed with and • The classifier is a cascade. • Score is propagated from one node to another. • Main advantages: • Weak classifiers are learned (approximately) optimally. • No training of multiple boosted classifiers. • Much fewer weak classifiers are needed than traditional cascades.

ResultsGoal () vs. Number of weak classifiers (K) • Toy problem:To learn a (single-exit) boosted classifier F for classifying face/non-face patches such that FAR(F) < 0.8 and FRR(F) < 0.01 • Empirically best goal: • Our method chooses: • Similar results were obtained for tests on other desired error rates.

Ours vs. Others (in Face Detection) • Use Fast StatBoost as base method for fast-training a weak classifier.

Ours vs. Others (in Face Detection) • MIT+CMU Frontal Face Test set:

Conclusion • Multi-exit Asymmetric Boosting trains every weak classifier approximately optimally. • Better accuracy • Much fewer weak classifiers • Significantly reduces training time • No more trial-and-error for training a boosted classifier

Fast Training and Selection of Haar-like Features using Statistics • ICCV’07 oral paper: • Minh-Tri Pham and Tat-Jen Cham. Fast Training and Selection of Haar Features using Statistics in Boosting-based Face Detection. In Proc. International Conference on on Computer Vision (ICCV), Rio de Janeiro, Brazil, 2007. • Won Travel Grant Award • Won Second Prize, Best Student Paper in Year 2007 Award, Pattern Recognition and Machine Intelligence Association (PREMIA), Singapore

Motivation • Face detectors today • Real-time detection speed …but… • Weeks of training time

Why is Training so Slow? • Time complexity: O(MNT log N) • 15ms to train a feature classifier • 10 minutes to train a weak classifier • 27 days to train a face detector

Why Should the Training Time be Improved? • Tradeoff between time and generalization • E.g. training 100 times slower if we increase both N and T by 10 times • Trial and error to find key parameters for training • Much longer training time needed • Online-learning face detectors have the same problem

Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection

Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection

Presentation Transcript

Face Detection

Face detection

Face-Detection using Maximal Rejection Classification and Color Techniques

Face detection and recognition

Face detection and recognition

Introduction to Skin and Face Detection

Asymmetric Boosting for Face Detection

Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection

Face Detection

Face detection

Face Detection

NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING

Face Detection

Face Detection

Face detection

Face Detection

Inductive Approaches to the Detection and Classification of Semantic Relation Mentions

Face detection

Face Detection