370 likes | 753 Views
(moving or acting with great speed). (increase the strength or value of Sth). Rapid Object Detection using a Boosted Cascade of Simple Features. Original Author Paul Viola & Michael Jones In: Proc. Conf. Computer Vision and Pattern Recognition. Volume 1., Kauai, HI, USA (2001) 511 – 518.
E N D
(moving or acting with great speed) (increase the strength or value of Sth) Rapid Object Detection using a Boosted Cascade of Simple Features Original Author Paul Viola & Michael Jones In: Proc. Conf. Computer Vision and Pattern Recognition. Volume 1., Kauai, HI, USA (2001) 511–518 Speaker: Jing Ming Chiuan (井民全)
Outline • Introduction • The Boost algorithm for classifier learning • Feature Selection • Weak learner constructor • The strong classifier • A tremendously difficult problem • Result • Conclusion
Speed up the feature evaluation Select a small # of visual features from a larger set yield an efficient classifiers Discard the background regions of the image What had we done? • A machine learning approach for visual object detection • Capable of processing images extremely rapidly • Achieving high detection rates • Three key contributions • A new image representation Integral Image • A learning algorithm( Based on AdaBoost[5]) • A combining classifiers method cascade classifiers
Working only with a single grey scale image A demonstration on face detection • A frontal face detection system • The detector run at 15 frames per second without resorting to image differencing or skin color detection 384 x 288 on a PentiumIII 700 MHz Image difference in video sequences
The broad practical applicationsfor a extremely fast face detector • User Interface, Image Databases, Teleconferencing • The system can be implemented on a small low power devices. Compaq iPaq 2 frame/sec
Training process for classifier • The attentional operator is trained to detect examples of a particular class --- a supervised training process Face classifier is constructed In the domain of face detection < 1% false negative <40% false postivie
Cascaded detection process • The sub-windows are processed by a sequence of classifiers each slightly more complex than the last Any classifier rejects the sub-window, no further processing is performed • The process is essentially that of a degenerate decision tree
Our object detection framework Feature Evaluation Haar Basis Functions Integral Image Haar Basis Functions Original Image Haar Basis Functions In order to computing features rapidly at many scales Large # of features Cascaded Classifiers Structure Feature Selection Small set of critical features Modified Ada Boost Procedure
Feature Selection The detection process is based on the feature rather than the pixels directly. • The simple features are used Two Reasons: The ad-hoc domain knowledge is difficult to learn using a finite quantify of training data. The feature based system operates much faster The Haar basis functions which have been used by Papageorgiou et al.[9]
The region have the same size and shape And are horizontally or vertically adjacent Three-Rectangle Feature the sum within two outside rectangle subtracted from the sum in a center rectangle Four-Rectangle Feature The difference between the diagonal pairs of rectangles The base resolution is 24x24 The exhaustive set of rectangle is large, over 180,000. Three kinds of featuresFeature Selection Two-Rectangle Feature The difference between the sum of pixels within two rectangular regions
+ A intermediated representation for rapidly computing the rectangle features Integral Image The integral image The original image The recurrences pair for one pass computing i The cumulative row sum 3 1 ii s 9 4 + +
Calculating any rectangle sum with integral image 1 A 2 A + B 3 A + C 4 A + B + C + D Rectangle Sum D = 4 - 3 - 2 + 1
Weak Learner 1 Weak Learner 2 Weak Learner 2 The final strong classifier Learning Classification Functions Feature Set Learning Process Face A variant AdaBoost procedure non- Face • Training set • Positive • Negative The final strong classifier AdaBoost learning algorithm Is used to do the feature selection task 24 Over 180,000 rectangle features associate with each sub-image 24
The Boost algorithm for classifier learning Image Step 1: Giving example images Positive =1 Negative=0 Step 2: Initialize the weights Weak learner constructor For t = 1, … , T 1. Normalize the weights, 2. For each feature j, train a classifier hj which is restricted to using a single feature 3. Update the weights:
Errors Update the weights miss correct correct miss Weak learner constructor 圖示解說 Training set Normalized the weights Over 180,000 features for each subimage Features
False positive False negative Training the weak learner 圖解說明 If fj(x) > X is a face ex X (Training set) Face examples Non-Face examples
AdaBoosting • Place the most weight on the examples must often misclassified by the preceding weak rules • Forcing the base learner to focus its attention on the “hardest” examples
The Boost algorithm for classifier learning Step 1: Giving example images Step 2: Initialize the weights Weak learner constructor For t = 1, … , T 1. Normalize the weights, 2. For each feature j, train a classifier hj which is restricted to using a single feature 3. Update the weights: Selected the weaker classifiers Final strong classifier
Stage 3 Stage 2 Pass Pass Ada Boosting Learner Ada Boosting Learner False (Reject) False (Reject) The Big Picture on testing process Stage 1 Ada Boosting Learner Feature Select & Classifier Feature set 100% Detection Rate 50% False Positive False (Reject) Reject as many negatives as possible (minimize the false negative)
A tremendously difficult problem • How to determine • The number of classifier stages • The number of features in each stages • The threshold of each stage
Training example Stage 1 Stage 2 Ada Boosting Learner Pass Ada Boosting Learner 100% Detection Rate 50% False Positive Feature Select & Classifier False (Reject) False (Reject) face Non-face
Result • A 38 layer cascaded classifier was trained to detect frontal upright faces • Training set: • Face: 4916 hand labeled faces with resolution 24x24. • Non-face: 9544 images contain no face. (350 million subwindows within these non-face images) • Features • The first five layers of the detector: 1, 10, 25, 25 and 50 features • Total # of features in all layer 6061
Result • Each classifier in the cascade was trained • Face : 4916 + the vertical mirror image 9832 images • Non-face sub-windows: 10,000 (size=24x24)
Outline Result • Speed of the final Detector • Image Processing • Scanning the Detector • Integration of Multiple Detector • Experiments on a Real-World Test Set
Speed of the final DetectorResult • The speed is directly related to the number of features evaluated per scanned sub-window. • MIT+CMU test set • An average of 10 features out of a total 6061 are evaluated per sub-window. • On a 700Mhz PentiumIII, a 384 x 288 pixel image in about .067 seconds (using a staring scale of 1.25 and a step size of 1.5)
Image ProcessingResult • Minimize the effect of different lighting-conditions • Variance normalized reference: http://www.ic.sunysb.edu/Stu/sewang/papers/Fingerprint%20Classification%20by%20Directional%20Fields.pdf
is the rounding operation Scanning the DetectorResult • The final detector is scanned across the image at multiple scale and locations • Good results are obtained using a set of scales a factor of 1.25 apart • Locations are obtained by shifting the window some pixels • If the current scale is s, the window is shifted by Scale is achieved by scaling the detector itself rather than the image
Integration of Multiple DetectorResult • Multiple detections will usually occur around each face and some types of false positives. • A post-process to detected sub-windows in order to combine overlapping detections into a single detection • Two detections are in the same subset if their bounding regions overlap
Experiments on a Real-World Test SetResult The MIT+CMU frontal face test set consists of 130 images with 507 labeled frontal faces
Experiments on a Real-World Test SetResult Our detector Detection rates for various numbers of false positives on the MIT+ CMU test set containing 130 images and 507 faces.
75,081,800 sub-windows scanned ROC curve for the face detector on MIT+CMU test set Correct detection rate False Positive The detector was run using a step size of 1.0 and starting scale of 1.0
A simple voting scheme to further improve resultsResult • Running three detectors • The 38 layer one described above plus two similarly trained detectors • Output the majority vote of three detectors The improvement would be greater if the detectors were more independent.
Conclusion • A object detection approach minimizes computation time while achieving high detection rate • This paper brings together new algorithms, representations and insights which are quite generic The detector is approximately 15 times faster than previous approach
Conclusion • The database set includes faces under very wide range of conditions including: illumination, scale, pose, and camera variation