Robust Real-Time Object Detection

Robust Real-Time Object Detection Paul Viola & Michael Jones

Introduction • Frontal face detection is achieved • Comparatively satisfactory detection rates • Efficient decrease in false positive rate • Extremely rapid operation • 384*288 pixel image is processed for 15 frames/second

Contribution of The Paper • Integral image • A new image representation • AdaBoost • Effective classifier selection • Cascade structure of complex classifiers • Dramatic decrease in detection time

Simple Rectangle Features • Why not use pixels directly ? • Features encodes domain knowledge that is hard to learn by finite quantity of training data • Features operates much faster than pixel based systems

Integral Image • Double integral of original image • A new representation of image for fast calculation of rectangle features

Integral Image • Sum of pixels in rectangle D from the original image can be defined in integral image as : P(4) -P(3)-P(2)+P(1)

Advantages of Integral Image • Pyramid image • Requires a pyramid of images • A fixed scaled detector works on all those images • Forming the pyramid is computationally expensive • Integral Image • A single feature can be evaluated at any scale and location in a few operations • Integral image is computed in one pass over the original image

Learning Classification Functions • 45,394 features associated with each sub-window • A very small number of these features can be combined to form an effective classifier • A variant of AdaBoost is used to • Select features • Train the classifier

How does AdaBoost work? • Combines a mixture of weak classifiers to form a strong one • Percepton algorithm returns the one having the minimum classification error • The examples are re-weighted in according to the accuracy of the first classifier • The final strong classifier is a weighted combination of weak classifiers

How does AdaBoost work? • First and second features selected by AdaBoost

Attentional Cascade • Increase detection performance & reduce computation time • Calling simpler classifiers before complex ones • A simple classifier example (two-feature): • 100% detection rate • 40% false positives • 60 microprocessor instructions (very efficient)

Attentional Cascade

Training of Cascade of Classifiers • The deeper classifiers are trained with harder examples • Simple classifiers in the first stages, complex ones in the deeper parts of the cascade • Complex classifiers takes more time to compute

Training of Cascade of Classifiers • A general detection algorithm works like 85-95% detection rate & 10-5 - 10-6 % false positive rate • The cascade system works like • With a 10 stage classifier • For each cascade a detection rate 99% and false positive rate 30 % • Overall system runs at • (.9910~) 90% detection rate • (0.3010 ~)6 * 10-6% false positive rate

Requirements • Needs to be determined : • Number of stages • Number of features for each stage • Thresholds for each stage

Practical Implementation • User selects acceptable fi and di for each layer • Each layer is trained for Adaboost • Number of features are increased until target fi and di are met for this level • If overall target F and D is not met for the system we add a new level to the cascade

Results – Structure of Cascade • 32 layers – 4297 features • Weeks spent to train the cascade

Results – Algorithm Details • All sub-windows (training – testing) are variance normalized for lighting conditions • Scaling is achieved by just scaling the detectors rather than the image • Step size of one pixel is used

Results – Algorithm details

Results • Most of the windows are rejected in the first & second cascade • Face detection on a 384x288 image runs in about 0.067 seconds • 15 times faster than Rowley-Baluja-Kanade • 600 times faster than Schneiderman- Kanade

Results 150 images and 507 faces

Robust Real-Time Object Detection