280 likes | 389 Views
Challenges in Learning the Appearance of Faces for Automated Image Analysis: part I. alessandro verri DISI – università di genova verri@disi.unige.it. actually, i’m gonna talk about:. brief introduction (the whole thing) what some people do for detecting faces what we are doing.
E N D
Challenges in Learning the Appearance of Faces for Automated Image Analysis: part I alessandro verri DISI – università di genova verri@disi.unige.it
actually, i’m gonna talk about: • brief introduction (the whole thing) • what some people do for detecting faces • what we are doing
the problem(s) • geometry (position, rotation, pose, scale,…) • facial features (beards, glasses,…) • facial expressions • occlusions • imaging conditions (illumination, resolution, color contrast, camera quality,…)
where we are: face detection we address face detection as a brute force classification problem (sometimes sophisticated, but still brute force) the model is encoded in the training samples but not explicitly defined
face recognition and the like explicit image models are derived from examples separating identity and imaging parameters
motivation we want to explore who should learn from whom… we come back to this at the end!
some approaches • knowledge-based (Yang & Huang 94) • feature invariant (Leung et al, 95; Yow & Cipolla, 97) • template matching (Lanitis et al, 95) • appearance based • eigenfaces (Turk & Pentland, 91) • SVM (Osuna et al, 97) • naive bayes(Schneiderman & Kanade, 98) • AdaBoost (Viola and Jones, 01)
SVM: global detector(Poggio’s group) • some preprocessing essential (equalization and normalization) • polynomial SVM applied to pixels • training set: • about 2,500 face images (58x58 pixels) • about 10,000 non face images (extended to 13,000)
SVM: component-based detector(Poggio’s group) • some preprocessing essential (equalization and normalization) • two level system (always linear SVMs): • component classifiers (14: eyes, nose,…) • geometrical configuration classifier based on maximal outputs
global vs component-based • component-based performs better (more robust to pose variation and/or occlusion) • global a little faster (though they are both pretty slow, too many patches have to be stored!)
naive bayes (Kanade’s group) • multiple detectors for different views (size and orientation) • for each view: statistical modeling using predefined attribute histograms (17), about 2,000 face examples • independency is required… very good for out-of-plane rotation but involved procedure for building histograms (bootstrap, AdaBoost…)
AdaBoost (Viola & Jones) • wavelet like features (computed efficiently) • feature selected through AdaBoost (each weak classifier depends on a single feature) • detection is obtained by training a cascade of classifiers • very fast and effective on frontal faces
summing up • SVM: components based on prior knowledge, simple, very good results but rather slow (optimization approaches…) • naive bayes: robust against rotation, prior knowledge on feature selection, rather hand crafted statistical analysis, many models need to be learned (each with many examples) • AdaBoost: data-driven feature selection, fast, frontal face only
what we do • we assume we are given a fair number of positive examples only (no negatives) • we want to explore how far one can get by combining fully data driven techniques based on 1D data • we look at old-fashioned hypothesis testing (false negative rate under full control)
one possible way to object detection • building models can be expensive (domain dependent) • learning from positive examples only is more difficult, but… • classical hypothesis testing controls the false negative rate naturally
testing hypotheses • HT with only one observation • testing for independence with rank test (seems appropriate for comparing different features)
CBCL database faces(19x19pixels)training: 2429 test: 472 nonfaces(19x19pixels)training: 4548test: 23573
training by hypothesis testing • we first compute a large number of features (for the moment about 16,000) on the training set images • a subset of good features (about 1,000) is then selected • of these, a subset of independent features is considered (ending up with about 100) • multiple statistical tests are then constructed using the training set (one test for each feature)
image measurements • grey value at fixed locations (19 x 19) • tomographies (19 vertical + 19 horizontal + 37 along the 45deg diagonals) • ranklets (5184 vertical, 5184 horizontal, 5184 diagonal) • a total of about 16,000 features
a salient and a non-salient feature we discard all features for which the ratio falls below the threshold s0.15 (this leaves us with about 2000 features)
independent feature selection • we run independence tests on all possible pairs of salient features of the same category • we build a complete graph for each category with as many nodes as features. An edge between two features is deleted if for the two features the Spearman’s test rejects the independence hypothesis with probabilityt
independent feature selection • we then search the graph for maximally complete subgraphs (cliques) which we regard as sets of independent features for t = 0.5 we are left with 44 vertical, 64 horizontal, 35 diagonal ranklets, and 38 tomographies
testing • for all image locations, all applicable scales and a fixed number t • compute the values of the good, independent features • perform multiple statistical tests at a certain confidence level • a positive example is located if t tests are passed
multiple tests • we run multiple tests living with the fact that we won’t detect a certain fraction of the objects we want to find • luckily we are in a position to decide the fraction beforehand • we gain power because each test looks at a new feature
some results (franceschi et al, 2004)472 positive vs 23,573 negatives tomographies + ranklets overlapping features randomly chosen
once you have detected a face… ask Thomas