190 likes | 263 Views
Train a Classifier Based on the Huge Face Database. Jie Chen, Ruiping Wang, Shengye Yan, Shiguang Shan, Xilin Chen, Wen Gao. Presented by: Jie Chen. Motivation. Data collection is tedious but essential for learning based algorithms In Viola CVPR 2001, bootstrap for negative ;
E N D
Train a Classifier Based on the Huge Face Database Jie Chen, Ruiping Wang, Shengye Yan, Shiguang Shan, Xilin Chen, Wen Gao Presented by: Jie Chen
Motivation • Data collection is tedious but essential for learning based algorithms • In Viola CVPR 2001, bootstrap for negative; • Ours: Resampling the positive set, besides the bootstrap for negative. • Why? • Collected face samples randomly; • Result in the bias of the trained detector. • How? • Fill in the face example space by GA; • Subsample it by manifold; • Mend by SVM. The collected face and nonface set Resulting distribution.
Contribution of this Paper • Subsample a small but efficient and representative subset based on the manifold: • Discuss the effects of outliers; • The performance is instable to train a detector based on the random subsampling. However, a detector trained on the subsampled face set by manifold is not only stable but also performance improved; • When we prepare the training set, we should collect more samples along those dimensionalities with larger variances to get a nearly uniformed distribution in the manifold, for example, left-right pose of faces more than up-down pose.
Manifold • A typical manifold –Swiss Roll (B. J. Tenenbaum, V. Silva, and J. Langford ) from http://www.cs.toronto.edu/~roweis/lle/
Face Sample Manifold Too sparse! Too dense! An individual with varying pose and expression from http://www.cs.toronto.edu/~roweis/lle/
Dimensionalities of Isomap • The residual variance of Isomap embedding on the 698 face database left-right pose up-down pose lighting direction
Dimensionalities • Each coordinate axis of the embedding correlates highly with one degree of freedom underlying the original data: • left-right pose corresponding to the first degree of freedom; • up-down pose corresponding to the second one ; • lighting direction to the third one. • That is to say the scatter of face images in left-right pose is the biggest while the scatter in lighting is the smallest among these three factors. • We conclude that, in order to select representative example set, we should pay more attention to the left-right pose variations than the up-down pose.
Subsampling by manifold (a) (b) (c) (a) illustration of subsampling based on the estimated geodesic distance; (b) manifold of 698 faces; (c) subsampled results.
Experiments:Subsampling by manifold • training set -- 6,977 images (2,429 faces and 4,548 non-faces) • testing set -- 24,045 images (472 faces and 23,573 non-faces). • All of these images are grayscale and they are available on the CBCL webpage. • let K=6 for the manifold learning. • Trained on the AdaBoost based classifier
Subsampling based on manifold • Some possible reasons: • Examples subsampled based on the manifold distributereasonable in the example space and have no example congregating compared with the whole set; • Outliers in the whole set deteriorate its performance
Subsampling based on the manifold and random • Results based on random subsampling is much instable
Outliers effects • Outliers deteriorate its performance
Large scale of database • The face-image database consists of 100,000 faces (collected form web, video and digital camera); • Randomly rotate , translate and scale; • After these preprocessing, we get 1,200,000 face images which constitute the whole set; • The first group is composed of 15,000 face images which are subsampled by the manifold (ISO15000) ; • The second or third group is also composed of 15,000 face images which are random subsampling (Rand1-15000 and Rand2-15000).
Test on MIT+CMU set • Sampled training set by the manifold and the random subsampled set • Trained on the AdaBoost based classifier
The ROC curves comparison • Compared with other published algorithms on the MIT+CMU face test set
Conclusion • Present a manifold-based method to subsample. • Compared with the detector by random subsampling, the detector trained by manifold is more stable and achieve better performance. • Improved performance results from: • Reasonable-distributed examples, subsampled based on manifold, • Nooutliers, discarded during the manifold learning
By the way… • 1. Demo outside • Face Recognition against a large scale face database from our lab. • 2. BJUT-3D face database available • 500 3D faces! Free! • Assign a release agreement • For research purpose only • Get it now outside beside the demo desk.