620 likes | 888 Views
Robust real-time face detection. Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted from Bill Freeman, MIT 6.869, April 2005). Scan classifier over locs. & scales. “Learn” classifier from data. Training Data
E N D
Robust real-time face detection Paul A. Viola and Michael J. Jones Intl. J. Computer Vision 57(2), 137–154, 2004 (originally in CVPR’2001) (slides adapted from Bill Freeman, MIT 6.869, April 2005) Robust Real-Time Face Detection
Scan classifier over locs. & scales Robust Real-Time Face Detection
“Learn” classifier from data • Training Data • 5000 faces (frontal) • 108 non faces • Faces are normalized • Scale, translation • Many variations • Across individuals • Illumination • Pose (rotation both in plane and out) Robust Real-Time Face Detection
Characteristics of Algorithm • Feature set (…is huge about 16M features) • Efficient feature selection using AdaBoost • New image representation: Integral Image • Cascaded Classifier for rapid detection • Fastest known frontal face detector for gray scale images Robust Real-Time Face Detection
Integral Image • Allows for fast feature evaluation • Do not work directly on image intensities • Compute integral image using a few operations per pixel(similar with Haar Basis functions) Robust Real-Time Face Detection
Simple and Efficient Classifier • Select a small number of important features from a huge library of potential features using AdaBoost [Freund and Schapire,1995] Robust Real-Time Face Detection
AdaBoost, Adaptive Boosting • Formulated by Yoav Freund and Robert Schapire.[1] • It is a meta-algorithm, can be used in conjunction with many other learning algorithms to improve their performance. • AdaBoost is adaptive • subsequent classifiers are tweaked in favor of instances misclassified by previous classifiers. • Sensitive to noisy data and outliers. • Less susceptible to the overfitting problem than most algorithms in some problems. • Calls a weak classifier repeatedly in a series of rounds from T classifiers. • For each call • a distribution of weights Dt is updated that indicates the importance of examples in the data set • On each round, • the weights of each incorrectly classified example are increased • Or alternatively, the weights of each correctly classified example are decreased), • The new classifier focuses more on those examples Robust Real-Time Face Detection
AdaBoost • Given , • Initialize • For • For each classifier that minimizes the error with respect to the distribution • is the weighted error rate of classifier • If , then stop • Choose , typically • Update • where is a normalized factor (choose so that Dt+1 will sum_x=1) Robust Real-Time Face Detection
AdaBoost • Output the final classifier • The equation to update the distribution Dt is constructed so that • After selecting an optimal classifier for the distribution • Examples that the classifier identified correctly are weighted less • Examples that is identified incorrectly are weighted more. • When the algorithm is testing the classifiers on the distribution • it will select a classifier that better identifies those examples that the previous classifier missed. Robust Real-Time Face Detection
Characteristics of Algorithm • Feature set (…is huge about 16M features) • Efficient feature selection using AdaBoost • New image representation: Integral Image • Cascaded Classifier for rapid detection Robust Real-Time Face Detection
Cascaded Classifier • Combining successively more complex classifiers in a cascade structure • Dramatically increases the speed of the detector by • Focusing attention on promising regions of the image. • Focus of attention approaches • It is often possible to rapidly determine where in an image a face might occur (Tsotsos et al., 1995; Itti et al., 1998; Amit and Geman, 1999; Fleuret and Geman, 2001). • More complex processing is reserved only for these promising regions. • The key measure of such an approach is the “false negative” rate of the attentional process. Robust Real-Time Face Detection
Cascaded Classifier • Training process • An extremely simple and efficient classifier • Used as a “supervised” focus of attention operator. • A face detection attentional operator • Filter out over 50% of the image • Preserving 99% of the faces over a large dataset • This filter is exceedingly efficient • it can be evaluated in 20 simple operations per location/scale Robust Real-Time Face Detection
Overview • Features: form and computing • Combing features to form a classifier: AdaBoost • Constructing cascade of classifiers • Experimental results • Discussions Robust Real-Time Face Detection
Features • Using features rather than image pixels • Features act to encode ad-hoc domain knowledge that is difficult to learn using a finite quantity of training data • Much faster than a pixel-based system Robust Real-Time Face Detection
Image features • “Rectangle filters” [Papageorgiou et al. 1998] • Similar to Haar wavelets • Differences between sums of pixels inadjacent rectangles • About 160000 rectangle features for a 200x200 image Robust Real-Time Face Detection
Integral Image • Partial sum • Any rectangle is • D = 1+4-(2+3) • Also known as: • summed area tables [Crow84] • boxlets [Simard98] Robust Real-Time Face Detection
Huge library of filters Robust Real-Time Face Detection
Feature Discussion • Primitivewhen compared with steerable filters, etc… • Excellentfor the detailed analysis of boundaries, image compression, and texture analysis. • Sensitive to the presence of edges, bars, and other simple image structure • Quite coarse: only three orientations (|, X, --) • Overcomplete: 400 times, aspect ratio, location Robust Real-Time Face Detection
Computational Advantage • Face detector scans the input at many scales • starting at the base scale: detect face at a size of 24 × 24 pixels, • Then at 12 scales, 1.25 larger than the last • 384 × 288 pixel image is scanned at the top scale • The conventional approach: • Compute a pyramid of 12 images (smaller and smaller image) • A fixed scale detector is scanned at each image. • Computation of the pyramid directly requires significant time. • It takes around .05 seconds to compute a 12 level pyramid of this size (on an Intel PIII 700 MHz processor) • Implemented efficiently on conventional hardware (using bilinear interpolation to scale each level of the pyramid) Robust Real-Time Face Detection
Computational Advantage • Define a meaningful set of rectangle features • A single feature can be evaluated at any scale and location in a few operations. • Effective detectors is constructed with two rectangle features. • Computational efficiency of features • Face detection process can be completed for an entire image at every scale at 15 frames per second • About the same time required to evaluate the 12 level image pyramid alone. Robust Real-Time Face Detection
Learning Classification Functions • Any machine learning methods • Given the feature set and training set • Mixture of Gaussian model (Sung and Poggio, 1998) • Simple image feature and neural network (Rowley et al. 1998) • Support Vector Machine (Osuna et al. 1997b) • Winnow learning procedure (Roth et al. 2000) 160000 features Even though each feature can be computed very efficiently, computing the complete set is prohibitively expensive Robust Real-Time Face Detection
AdaBoost • A very small number of features can be combined to form an effective classifier • Boost the classification performance • Combining a collection of weak classification functions to form a stronger classifier • Weak learner • Do not expect even the best classification function to classify the training data well • The first round of learning • Examples are re-weighted in order to emphasize those which were incorrectly classified by the previous weak classifier. • The final strong classifier • takes the form of a perceptron, a weighted combination of weak classifiers followed by a threshold.6 Training error of the strong classifier approaches zero exponentially in the number of rounds Robust Real-Time Face Detection
AdaBoost • Selecting a small set of good classification functions nevertheless have significant variety • Select effective features which nevertheless have significant variety • Restrict the weak learner to classification functions • Each function depends on a single feature • Select the single rectangle feature • which best separates the positive and negative examples 24x24 subwindow Polarity indicating the direction of inequality threshold feature Robust Real-Time Face Detection
AdaBoost • No single feature can perform the classification task with low error • Features selected early: error rates 0.1~0.3 • Features selected later: error rates 0.4~0.5 • Threshold single features • Single node decision trees • Decision stumps Robust Real-Time Face Detection
Constructing the classifier • Perceptron yields a sufficiently powerful classifier • Use AdaBoost to efficiently choose best features • add a new hi(x) at each round • each hi(xk) is a “decision stump” hi(x) b=Ew(y [x> q]) a=Ew(y [x< q]) x q Robust Real-Time Face Detection
Constructing the Classifier • For each round of boosting: • Evaluate each rectangle filter on each example • Sort examples by filter values • Select best threshold for each filter (min error) • Use sorting to quickly scan for optimal threshold • Select best filter/threshold combination • Weight is a simple function of error rate • Reweight examples • (There are many tricks to make this more efficient.) Robust Real-Time Face Detection
AdaBoost using single rectangular feature • Given example images , • Initialize weight • For • Normalize the weights • Select the best classifier with respect to the weighted error • Define with the parameters minimizing • Update weights Robust Real-Time Face Detection
AdaBoost using single rectangular feature • The final strong classifier Robust Real-Time Face Detection
Good Reference on Boosting • Friedman, J., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting http://www-stat.stanford.edu/~hastie/Papers/boost.ps • “We show that boosting fits an additive logistic regression model by stagewise optimization of a criterion very similar to the log-likelihood, and present likelihood based alternatives. We also propose a multi-logit boosting procedure which appears to have advantages over other methods proposed so far.” Robust Real-Time Face Detection
Learning Discussion • The set of weak classifier is extraordinarily large • One weak classifier for each distinct feature/threshold combination • KN weak classifier • K: the number of features • N: the number of examples • Others have larger classifier sets • Wrapper method • M weak classifier: O(MNKN) 10^16 operations • AdaBoost • O(MKN) 10^11 operations Robust Real-Time Face Detection
Learning Discussion • Dependency on N? • Suppose that the examples are sorted by a given feature value. • Any two thresholds that lie between the same pair of sorted examples is equivalent. • Therefore the total number of distinct thresholds is N • For each feature, sort the examples based on feature value • Compute optimal threshold for that feature in a single pass over this sorted list. • For each element in the list, Compute • Total sum of positive example weights T+ • Total sum of negative example weights T- • the sum of positive weights below the current example S+ • The sum of negative weights below the current example S- Robust Real-Time Face Detection
Learning Discussion • Error of a threshold split the list • The final application demanded a very aggressive process which would discard the vast majority of features. Robust Real-Time Face Detection
Other feature selection • Papageorgiou et al.1998 • Feature selection based on feature variance. • 37 features out of 1734 features for every image subwindow: still large • Roth et al. 2000 • Feature selection process based on the Winnow exponential perceptron learning rule • A very large and unusual feature set: each pixel is mapped into a binary vector of d dimensions • Concatenate all pixels to nd-D vector • Perceptron: assign weight to each dimension • Winnow learning process: • Converges to a solution where many of the weights are zero • Very large number of features are retained (perhaps a few hundred or thousand). Robust Real-Time Face Detection
Learning Results • The classifier constructed from 200 features would yield reasonable results For a face detector to be practical for real applications, the false positive rate must be closer to 1 in 1,000,000. 1 in 14084 Robust Real-Time Face Detection
Learning Results • Features selected by AdaBoost are meaningful and easily interpreted • In terms of detection • Results are compelling but not sufficient for many real-world tasks. • In terms of computation • Very fast, requiring 0.7 seconds to scan an 384 by 288 pixel image. Robust Real-Time Face Detection
Attentional Cascade • Achieves increased detection performance while radically reducing computation time • Construct boost classifier • Rejecting many of negative sub-windows • Detecting almost all positive instances. • Adjusting the strong classifier threshold to minimize false negatives: lower threshold Robust Real-Time Face Detection
Attentional Cascade Further processing • Evaluate the rectangle features (requires between 6 and 9 array references per feature). • Compute the weak classifier for each feature (requires one threshold operation per feature) • Combine the weak classifiers (requires one multiply per feature, an addition, and finally a threshold). Robust Real-Time Face Detection
Attentional Cascade • Subsequent classifiers Robust Real-Time Face Detection
Trading speed for accuracy • Given a nested set of classifier hypothesis classes • Computational Risk Minimization Robust Real-Time Face Detection
Training a Cascade of Classifiers • Detection Goals • Good detection rates (85%~95%) and • Extremely low false positive rates (on the order of 10−5 or 10−6). • False positive rate of the cascade: • Detection rate: • To achieve a detection rate of 0.9 by a 10 stage classifier • each stage has a detection rate of 0.99 • false positive rate 30% (0.3010 ≈ 6 × 10−6). Robust Real-Time Face Detection
Training a Cascade of Classifiers • The expected number of features: • Scheme for trading off these errors is to adjust the threshold of the perceptron produced by AdaBoost the number of features in the ith classifier the positive rate of the ith classifier Robust Real-Time Face Detection
Tradeoffs in Training • Classifiers with more features • Achieve higher detection rates and lower false positive rates. • require more time to compute • An optimization framework in which • the number of classifier stages, • the number of features, ni, of each stage, • the threshold of each stage are traded off in order to minimize the expected number of features N given a target for F and D. • Finding this optimum is a tremendously difficult problem. Robust Real-Time Face Detection
Training Cascaded Detector • A simple framework to produce effective and efficient classifier • The user selects the maximum acceptable rate for fi and the minimum acceptable rate for di . • Each layer of the cascade is trained by AdaBoost with the number of features used being increased until the target detection and false positive rates are met for this level. • The rates are determined by testing the current detector on a validation set. • If the overall target false positive rate is not yet met then another layer is added to the cascade. • The negative set for training subsequent layers is obtained by collecting all false detections found by running the current detector on a set of images which do not contain any instances of faces. Robust Real-Time Face Detection
Training Cascaded Detector • User selects values for f , the maximum acceptable false positive rate per layer and d, the minimum acceptable detection rate per layer. • • User selects target overall false positive rate, F_target . • • P = set of positive examples, N = set of negative examples • • F0 = 1.0; D0 = 1.0, i = 0 • • while F_i > F_target • – i ←i + 1 • – ni = 0; Fi = Fi−1 • – while Fi > f × Fi−1 • ∗ ni ← ni + 1 • ∗ Use P and N to train a classifier with ni features using AdaBoost • ∗ Evaluate current cascaded classifier on validation set to determine Fi and Di . • ∗ Decrease threshold for the ith classifier until the current cascaded classifier has a detection rate of at least d × Di−1 (this also affects Fi ) • – N ← ∅ • – If Fi > Ftarget • Evaluate the current cascaded detector on the set of non-face images • put any false detections into the set N Robust Real-Time Face Detection
Simple Experiment • A monolithic 200-feature classifier and • A cascade of ten 20-feature classifiers • Training using • 5000 faces + 10000 nonface sub-windows Robust Real-Time Face Detection
Simple Experiment • A monolithic 200-feature classifier and • A cascade of ten 20-feature classifiers • Training using • 5000 faces + 10000 nonface sub-windows • Little difference between them in terms of accuracy • But cascaded classifier is nearly 10 times faster • since its first stage throws out most non-faces so that they are never evaluated by subsequent stages. Robust Real-Time Face Detection
Detector Cascade Discussion • Similar to Rowley et al. (1998) (fast) • Trained two neural networks • One was moderately complex • focused on a small region of the image, • detected faces with a low false positive rate. • Second neural network much faster • focused on a larger regions of the image, and • detected faces with a higher false positive rate • This method • two stage cascade include 38 stages Robust Real-Time Face Detection
Training Dataset • 4916 hand labeled faces scaled and aligned to a base resolution of 24 by 24 pixels. Robust Real-Time Face Detection
Structure of the Detector Cascade • 38 layer cascade of classifiers included a total of 6060 features • First classifier constructed using two features • rejects about 50% of non-faces while • correctly detecting close to 100% of faces. • The next classifier has ten features • rejects 80% of nonfaces while • detecting almost 100% of faces. • The next two layers are 25-feature classifiers • Then three 50-feature classifiers • Then classifiers with variety of different numbers of features chosen according Robust Real-Time Face Detection