690 likes | 838 Views
Face Detection and Head Tracking. Ying Wu yingwu@ece.northwestern.edu Electrical Engineering & Computer Science Northwestern University, Evanston, IL http://www.ece.northwestern.edu/~yingwu. Face Detection: The Problem. The Goal: Identify and locate faces in an image The Challenges:
E N D
Face Detection and Head Tracking Ying Wu yingwu@ece.northwestern.edu Electrical Engineering & Computer Science Northwestern University, Evanston, IL http://www.ece.northwestern.edu/~yingwu
Face Detection: The Problem • The Goal: Identify and locate faces in an image • The Challenges: • Position • Scale • Orientation • Illumination • Facial expression • Partial occlusion
Outline • The Basics • Visual Detection • A framework • Pattern classification • Handling scales • Viola & Jones’ method • Feature: Integral image • Classifier: AdaBoosting • Speedup: Cascading classifiers • Putting things together • Other methods • Open Issues
The Basics: Detection Theory Bayesian decision Likelihood ratio detection
Bayesian Rule prior likelihood posterior
Bayesian Decision • Classes {1, 2,…, c} • Actions {1, 2,…, a} • Loss: (k| i) • Risk: • Overall risk: • Bayesian decision
Likelihood Ratio Detection • x – the data • H – hypothesis • H0: the data does not contain the target • H1: the data contains the target • Detection: p(x|H1) > p(x|H0) • Likelihood ratio
Detection vs. False Positive threshold “+” “-” false positive miss detection threshold “+” “-” false positive miss detection
Visual Detection A Framework Three key issues target representation pattern classification effective search
Visual Detection • Detecting an “object” in an image • output: location and size • Challenges • how to describe the “object”? • how likely is an image patch the image of the target? • how to handle rotation? • how to handle the scale? • how to handle illumination?
A Framework • Detection window • Scan all locations and scales
Three Key Issues • Target Representation • Pattern Classification • classifier • training • Effective Search
Target Representation • Rule-based • e.g. “the nose is underneath two eyes”, etc. • Shape Template-based • deformable shape • Image Appearance-based • vectorize the pixels of an image patch • Visual Feature-based • descriptive features
Pattern Classification • Linear separable • Linear non-separable
Effective Search • Location • scan pixel by pixel • Scale • solution I • keep the size of detection window the same • use multiple resolution images • solution II: • change the size of detection window • Efficiency???
Viola & Jones’ detector Feature integral image Classifier AdaBoosting Speedup Cascading classifiers Putting things together
An Overview • Feature-based face representation • AdaBoosting as the classifier • Cascading classifier to speedup
Harr-like features Q1: how many features can be calculated within a detection window? Q2: how to calculate these features rapidly?
Training and Classification • Training • why? • An optimization problem • The most difficult part • Classification • basic: two-class (0/1) classification • classifier • online computation
Weak Classifier • Weak? • using only one feature for classification • classifier: thresholding • a weak classifier: (fj, j,pj) • Why not combining multiple weak classifiers? • How???
Training: AdaBoosting • Idea 1: combining weak classifiers • Idea 2: feature selection
Feature Selection • How many features do we have? • What is the best strategy?
The Final Classifier • This is a linear combination of a selected set of weak classifiers
Attentional Cascade • Motivation • most detection windows contain non-faces • thus, most computation is wasted • Idea? • can we save some computation on non-faces? • can we reject the majority of the non-faces very quickly? • using simple classifiers for screening!
Designing Cascade • Design parameters • # of cascade stages • # of features for each stage • parameters of each stage • Example: a 32-stage classifier • S1: 2-feature, detect 100% faces and reject 60% non-faces • S2: 5-feature, detect 100% faces and reject 80% non-faces • S3-5: 20-feature • S6-7: 50-feature • S8-12: 100-feature • S13-32: 200-feature
Comments • It is quite difficult to train the cascading classifiers
Handling scales • Scaling the detector itself, rather than using multiple resolution images • Why? • const computation • Practice • Use a set of scales a factor of 1.25 apart
Integrating multiple detection • Why multiple detection? • detector is insensitive to small changes in translation and scale • Post-processing • connect component labeling • the center of the component
Putting things together • Training: off-line • Data collection • positive data • negative data • Validation set • Cascade AdaBoosting • Detection: on-line • Scanning the image
Summary • Advantages • Simple easy to implement • Rapid real-time system • Disadvantages • Training is quite time-consuming (may take days) • May need enormous engineering efforts for fine tuning
Other Methods Rowley-Baluja-Kanade
Rowley-Baluja-Kanade Train a set of multilayer perceptrons and arbitrate a decision among all the inputs, and search among different scales, [Rowley, Baluja and Kanade, 1998]
RBK: Some Results Courtesy of Rowley et al., 1998
Open Issues Out-of-plane rotation Occlusion Illumination
Tracking Heads? Courtesy of Y. Wu, 2001 • The task: • Localize faces and track them in image sequences • Challenges: • Lighting, occlusion, rotation, etc.
Outline Motivation What is tracking? One solution (Birchfield_CVPR98) Other methods and open issues
Motivation • Why tracking? • The complexity of face detection • scan all the pixel positions and several scales • The limitation of face detection • hard to handle out-of-plane rotation • Can we maintain the identity of the faces? • although face recognition is the ultimate solution for this, we may not need it, if not necessary • Objectives • fast (frame-rate) face/head localization • handle 360o out-of-plane rotation
Four Elements • Infer target states in video sequences • Target states vs. image observations • Visual cues and modalities • Four elements • Target representation X • Observation representation Z • Hypotheses measurement p(Zt|Xt) • Hypotheses generating p(Xt|Xt-1)
Visual Tracking Ground Truth Hypothesis Prediction Estimation