760 likes | 885 Views
Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection. presented by. Pham Minh Tri Ph.D. Candidate and Research Associate Nanyang Technological University, Singapore. Outline. Motivation Contributions Fast Weak Classifier Learning
E N D
Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection presented by Pham Minh TriPh.D. Candidate and Research AssociateNanyang Technological University, Singapore
Outline • Motivation • Contributions • Fast Weak Classifier Learning • Automatic Selection of Asymmetric Goal • Online Asymmetric Boosting • Generalization Bounds on the Asymmetric Error • Future Work • Summary
Outline • Motivation • Contributions • Fast Weak Classifier Learning • Automatic Selection of Asymmetric Goal • Online Asymmetric Boosting • Generalization Bounds on the Asymmetric Error • Future Work • Summary
Application Face recognition
Application 3D face reconstruction
Application Camera auto-focusing
Application • Windows face logon • Lenovo Veriface Technology
1 0 Appearance-based Approach • Scan image with probe window patch (x,y,s) • at different positions and scales • Binary classify each patch into • face, or • non-face • Desired output state: • (x,y,s) containing face • Most popular approach • Viola-Jones ‘01-’04, Li et.al. ‘02, Wu et.al. ’04, Brubaker et.al. ‘04, Liu et.al. ’04, Xiao et.al ‘04, • Bourdev-Brandt ‘05, Mita et.al. ‘05, Huang et.al. ’05 – ‘07, Wu et.al. ‘05, Grabner et.al. ’05-’07, • And many more
1 0 Appearance-based Approach • Statistics: • 6,950,440 patches in a 320x240 image • P(face) < 10-5 • Key requirement: • A very fast classifier
A very fast classifier • Cascade of non-face rejectors: pass pass pass pass F1 F2 FN face …. reject reject reject non-face
A very fast classifier • Cascade of non-face rejectors: • F1, F2, …, FN : asymmetric classifiers • FRR(Fk) 0 • FAR(Fk) as small as possible (e.g. 0.5 – 0.8) pass pass pass pass F1 F1 F1 F1 F2 F2 F2 F2 FN FN FN FN F1 F1 F1 F1 F1 F2 F2 F2 F2 F2 FN face face face face face …. reject reject reject non-face non-face non-face non-face non-face
A very fast classifier • Cascade of non-face rejectors: • F1, F2, …, FN : asymmetric classifiers • FRR(Fk) 0 • FAR(Fk) as small as possible (e.g. 0.5 – 0.8) pass pass pass pass F1 F2 FN face …. reject reject reject non-face
Non-face Rejector • A strong combination of weak classifiers: F1 yes + + + f1,K pass …. f1,1 f1,2 no reject • f1,1, f1,2, …, f1,K : weak classifiers • : threshold > ?
Non-face Rejector • A strong combination of weak classifiers: F1 yes + + + f1,K pass …. f1,1 f1,2 no reject • f1,1, f1,2, …, f1,K : weak classifiers • : threshold > ?
Boosting Wrongly classified Weak Classifier Learner 1 Weak Classifier Learner 2 Correctly classified Wrongly classified Correctly classified Stage 1 Stage 2 : negative example : positive example
Asymmetric Boosting • Weight positives times more than negatives Weak Classifier Learner 1 Weak Classifier Learner 2 Stage 1 Stage 2 : negative example : positive example
Non-face Rejector • A strong combination of weak classifiers: F1 yes + + + f1,K pass …. f1,1 f1,2 no reject • f1,1, f1,2, …, f1,K : weak classifiers • : threshold > ?
Non-face Rejector • A strong combination of weak classifiers: F1 yes + + + f1,K pass …. f1,1 f1,2 no reject • f1,1, f1,2, …, f1,K : weak classifiers • : threshold > ?
Weak classifier • Classify a Haar-like feature value Classify v input patch feature value v score
Weak classifier • Classify a Haar-like feature value Classify v input patch feature value v score …
Main issues • Learning is time-consuming
Main issues • Learning is time-consuming
Weak classifier • Classify a Haar-like feature value Classify v input patch feature value v score 10 minutes to learn a weak classifier …
A very fast classifier • Cascade of non-face rejectors: pass pass pass pass F1 F2 FN face …. reject reject reject non-face • To learn a face detector ( 4000 weak classifiers): • 4,000 * 10 minutes 1 month
Main issues • Learning is time-consuming • Learning requires too much intervention from experts
Main issues • Learning is time-consuming • Learning requires too much intervention from experts
A very fast classifier • Cascade of non-face rejectors: • F1, F2, …, FN : asymmetric classifiers • FRR(Fk) 0 • FAR(Fk) as small as possible (e.g. 0.5 – 0.8) pass pass pass pass F1 F2 FN face …. reject reject reject non-face How to choose bounds for FRR(Fk) and FAR(Fk)?
Asymmetric Boosting How to choose ? • Weight positives times more than negatives Weak Classifier Learner 1 Weak Classifier Learner 2 Stage 1 Stage 2 : negative example : positive example
Non-face Rejector • A strong combination of weak classifiers: F1 yes + + + f1,K pass …. f1,1 f1,2 no reject • f1,1, f1,2, …, f1,K : weak classifiers • : threshold > ? How to choose ?
Main issues • Requires too much intervention from experts • Very long learning time
Outline • Motivation • Contributions • Fast Weak Classifier Learning • Automatic Selection of Asymmetric Goal • Online Asymmetric Boosting • Generalization Bounds on the Asymmetric Error • Future Work • Summary
Outline • Motivation • Contributions • Fast Weak Classifier Learning • Automatic Selection of Asymmetric Goal • Online Asymmetric Boosting • Generalization Bounds on the Asymmetric Error • Future Work • Summary
Outline • Motivation • Contributions • Fast Weak Classifier Learning • Automatic Selection of Asymmetric Goal • Online Asymmetric Boosting • Generalization Bounds on the Asymmetric Error • Future Work • Summary
Motivation • Face detectors today • Real-time detection speed …but… • Weeks of training time
Why is Training so Slow? • Time complexity: O(MNT log N) • 15ms to train a feature classifier • 10min to train a weak classifier • 27 days to train a face detector
Why is Training so Slow? • Time complexity: O(MNT log N) • 15ms to train a feature classifier • 10min to train a weak classifier • 27 days to train a face detector • Bottleneck: • At least O(NT)to train a weak classifier • Can we avoid O(NT)?
Our Proposal • Fast StatBoost: To train feature classifiers using statisticsrather than using input data • Con: • Less accurate … but not critical for a feature classifier • Pro: • Much faster training time: • Constant time instead of linear time
Fast StatBoost Non-face Face • Training feature classifiers using statistics: • Assumption: feature value v(t) is normally distributed given face class c is known • Closed-form solution for optimal threshold • Fast linear projectionsof the statistics of a window’s integral image into 1D statistics of a feature value Optimal threshold Feature value : mean and variance of feature value v(t) : random vector representing a window’s integral image : mean vector and covariance matrix of : Haar-like feature, a sparse vector with less than 20 non-zero elements constant time to train a feature classifier
Fast StatBoost • Integral image’s statistics are obtained directly from the weighted input data • Input: N training integral images and their current weights w(m): • We compute: • Sample total weight: • Sample mean vector: • Sample covariance matrix:
Fast StatBoost • To train a weak classifier: • Extract the class-conditional integral image statistics • Time complexity: O(Nd2) • Factor d2 negligible because fast algorithms exist, hence in practice: O(N) • Train T feature classifiers by projecting the statistics into 1D: • Time complexity: O(T) • Select the best feature classifier • Time complexity: O(T) • Time complexity: O(N+T)
(3) (4) (5) (6) (17) (7) Experimental Results Edge features: Corner features: • Setup • Intel Pentium IV 2.8GHz • 19 types 295,920 Haar-like features • Time for extracting the statistics: • Main factor: covariance matrices • GotoBLAS: 0.49 seconds per matrix • Time for training T features: • 2.1 seconds (1) (2) Diagonal line features: (10) (11) (12) (13) (8) (9) Line features: Center-surround features: (15) (18) (19) (14) Nineteen feature types used in our experiments (16) • Total training time: 3.1 secondsper weak classifier with 300K features • Existing methods: up to 10 minutes with 40K features or fewer
Experimental Results • Comparison with Fast AdaBoost (J. Wu et. al. ‘07), the fastest known implementation of Viola-Jones’ framework:
Experimental Results • Performance of a cascade: ROC curves of the final cascades for face detection
Conclusions • Fast StatBoost: • use of statistics instead of input data to train feature classifiers • Learning time: • A month 3 hours • Better detection accuracy: • Due to much more members of Haar-like features explored
Outline • Motivation • Contributions • Fast Weak Classifier Learning • Automatic Selection of Asymmetric Goal • Online Asymmetric Boosting • Generalization Bounds on the Asymmetric Error • Future Work • Summary
Outline • Motivation • Contributions • Fast Weak Classifier Learning • Automatic Selection of Asymmetric Goal • Online Asymmetric Boosting • Generalization Bounds on the Asymmetric Error • Future Work • Summary
Problem overview pass pass pass pass • Common appearance-based approach: • F1, F2, …, FN : boosted classifiers • f1,1, f1,2, …, f1,K : weak classifiers • : threshold F1 F2 FN object …. reject reject reject non-object F1 yes + + + > ? f1,K pass …. f1,1 f1,2 no reject
Objective • Find f1,1, f1,2, …, f1,K, and such that: • K is minimized proportional to F1’s evaluation time F1 yes + + + > ? f1,K pass …. f1,1 f1,2 no reject