340 likes | 353 Views
This study compares different approaches to gender classification from facial images using genetic feature subset selection. The goal is to improve performance by identifying the most relevant features for accurate classification.
E N D
Genetic Feature Subset Selection for Gender Classification: A Comparison Study Zehang Sun, George Bebis, Xiaojing Yuan, and Sushil Louis Computer Vision Laboratory Department of Computer Science University of Nevada, Reno bebis@cs.unr.edu http://www.cs.unr.edu/CVL
Gender Classification • Problem statement • Determine the gender of a subject from facial images. • Potential applications • Face Recognition • Human-Computer Interaction (HCI) • Challenges • Race, age, facial expression, hair style, etc.
Gender Classification by Humans • Humans are able to make fast and accurate gender classifications. • It takes 600 ms on the average to classify faces according to their gender (Bruce et al.,1987). • 96% accuracy has been reported using photos of non-familiar faces without hair information (Bruce et. al., 1993). • Empirical evidence indicates that gender decisions are always made much faster than identity. • Computation of gender and identity might be two independent processes. • There is evidence that gender classification is carried out by a separate population of cells in the inferior temporal cortex (Damasio et. al., 1990).
Classifier Feature Extraction Pre- Processing Designing a Gender Classifier • The majority of gender classification schemes are based on supervised learning. • Definition • Feature extraction determines an appropriate subspace of dimensionality m in the original feature space of dimensionality d (m << d).
Previous Approaches • Geometry-based • Use distances, angles, and areas among facial features. • Point-to-point distances + discriminant analysis (Burton ‘93, Fellous ‘97) • Feature-to-feature distances + HyberBF NNs (Brunelli ‘92) • Wavelet features + elastic graph matching (Wiskott ‘95) • Appearance-based • Raw images + NNs (Cottrell ‘90, Golomb ‘91, Yen ‘94) • PCA + NNs (Abdi ‘95), • PCA + nearest neighbor (Valentin ‘97) • Raw images + SVMs (Moghaddam ‘02)
What Information is Useful for Gender Classification? • Geometry-based approaches • Representing faces as a set of features assumes a-priori knowledge about what are the features and/or what are the relationships between them. • There is no simple set of features that can predict the gender of faces accurately. • There is no simple algorithm for extracting the features automatically from images. • Appearance-based approaches • Certain features are nearly characteristic of one sex or the other (e.g., facial hair for men, makeup or certain hairstyles for women). • Easier to represent this kind of information using appearance-based feature extraction methods. • Appearance-based features, however, are more likely to suffer from redundant and irrelevant information.
Feature Extraction Using PCA • Feature extraction is performed by projecting the data in a lower-dimensional space using PCA. • PCA maps the data in a lower-dimensional space using a linear transformation. • The columns of the projection matrix are the “best” eigenvectors (i.e., eigenfaces) of the covariance matrix of the data.
EV#1 • EV#2 • EV#3 • EV#4 • EV#5 • EV#6 • EV#8 • EV#10 • EV#12 • EV#14 • EV#19 • EV#20 Which Eigenvectors Encode Mostly Gender-Related Information? Sometimes, it is possible to determine what features are encoded by specific eigenvectors.
Which Eigenvectors Encode Mostly Gender-Related Information? (cont’d) • All eigenvectors contain information relative to the gender of faces, however, only the information conveyed by eigenvectors with large eigenvalues can be generalized to new faces (Abdi et al, 1995). • Removing specific eigenvectors could in fact improve performance (Yambor et al, 2000)
Critique of Previous Approaches • No explicit feature selection is performed. • Same features used for face identification are also used for gender classification. • Some features might be redundant or irrelevant. • Rely heavily on the classifier. • Classification accuracy can suffer. • Time consuming training and classification.
Classifier Pre- Processing Feature Extraction Feature Subset Feature Subset Feature Selection (GA) Project Goal • Improve the performance of gender classification using feature subset selection.
Feature Selection What constitutes a good set of features for classification? • Definition • Given a set of d features, select a subset of size m that leads to the smallest classification error. • Filter Methods • Preprocessing steps performed independent of the classification algorithm or its error criteria. • Wrapper Methods • Search through the space of feature subsets using the criterion of the classification algorithm to select the optimal feature subset. • Provide more accurate solutions than filter methods, but in general are more computationally expensive.
What are the Benefits? • Eliminate redundant and irrelevant features. • Less training examples are required. • Faster and more accurate classification.
Project Objectives • Perform feature extraction by projecting the images in a lower-dimensional space using Principal Components Analysis (PCA). • Perform feature selection in PCA space using Genetic Algorithms. • Test four traditional classifiers (Bayesian, LDA, NNs, and SVMs). • Compare with traditional feature subset selection approaches (e.g., Sequential Backward Floating Search (SBFS)).
Genetic Algorithms (GAs) Review • What is a GA? • An optimization technique for searching very large spaces. • Inspired by the biological mechanisms of natural selection and reproduction. • What are the main characteristics of a GA? • Global optimization technique. • Uses objective function information, not derivatives. • Searches probabilistically using a population of structures (i.e., candidate solutions using some encoding). • Structures are modified at each iteration using selection, crossover, and mutation.
Structure of GA • 10010110… 10010110… • 01100010… 01100010… • 10100100... 10100100… • 10010010… 01111001… • 01111101… 10011101… Evaluation and Selection Crossover Mutation Current Generation Next Genaration
Encoding and Fitness Evaluation • Encoding scheme • Transforms solutions in parameter space into finite length strings (chromosomes) over some finite set of symbols. • Fitness function • Evaluates the goodness of a solution.
Selection Operator • Probabilistically filters out solutions that perform poorly, choosing high performance solutions to exploit. • Chromosomes with high fitness are copied over to the next generation. fitness
Crossover and Mutation Operators • Generate new solutions for exploration. • Crossover • Allows information exchange between points. • Mutation • Its role is to restore lost genetic material. Mutated bit
Genetic Feature Subset Selection • Binary encoding • Fitness evaluation EV#1 EV#250 (search using first 250 eigenvectors) fitness=104accuracy +0.4 zeros accuracy from validation set number of features
Genetic Feature Subset Selection (cont’d) • Cross-generational selection strategy • Assuming a population of size N, the offspring double the size of the population, and we select the best N individuals from the combined parent-offspring population. • GA parameters • Population size: 350 • Number of generations: 400 • Crossover rate: 0.66 • Mutation rate: 0.04
Dataset • 400 frontal images from 400 different people • 200 male, 200 female • Different races • Different lighting conditions • Different facial expressions • Images were registered and normalized • No hair information • Account for different lighting conditions
Experiments • Gender classifiers: • Linear Discriminant Analysis (LDA) • Bayes classifier • Neural Network (NN) classifier • Support Vector Machine (SVM) classifier • Three - fold cross validation • Training set: 75% of the data • Validation set: 12.5% of the data • Test set: 12.5% of the data
Classification Error Rates 22.4% 17.7% 14.2% 13.3% 11.3% 8.9% 9% 6.7% 4.7% ERM: error rate using manually selected feature subsets ERG: error rate using GA selected feature subsets
Ratio of Features - Information Kept 69% 61.2% 42.8% 38% 36.4% 32.4% 31% 17.6% 13.3% 8.4% RN: percentage of number of features in the feature subset RI: percentage of information contained in the feature subset.
Distribution of Selected Eigenvectors (b) Bayes (a) LDA (c) NN (d) SVMs
Reconstructed Images Original images Using top 30 EVs Using EVs selected by LDA-PCA+GA Using EVs selected by B-PCA+GA
Reconstructed Images (cont’d) Original images Using top 30 EVs Using EVs selected by NN-PCA+GA Using EVs selected by SVM-PCA+GA Reconstructed faces using GA-selected EVs have lost information about identity but do disclose strong gender information Certain gender-irrelevant features do not appear in the reconstructed images using GA-selected EVs
Comparison with SBFS • Sequential Backward Floating Search (SBFS) is a combination of two heuristic search schemes: • (1) Sequential Forward Selection (SFS) • - starts with an empty feature set and at each set selects the best single feature to be added to the feature subset. • (2) Sequential Backward Selection (SBS). • - starts with the entire feature and at each step drops the feature whose absence least decreases the performance.
Comparison with SBFS (cont’d) • SBFS is an advanced version of plus l - take away r method that first enlarges the feature subset by l features using forward selection and then removes r features using backward selection. • The number of forward and backward steps in SBFS is dynamically controlled and updated based on the classifier’s performance.
Comparison with SBFS (cont’d) (b) SVMs+GA (a) SVMs+SBFS ERM: error rate using the manually selected feature subsets; ERG: error rate using GA selected feature subsets. ERSBFS: error rate using SBFS
Comparison with SBFS (cont’d) Original images Using top 30 EVs Using EVs selected by SVM-PCA+GA Using EVs selected by SVM-PCA+SBFS
Conclusions • We have considered the problem of gender classification from frontal facial images using genetic feature subset selection. • GAs provide a simple, general, and powerful framework for feature subset selection. • Very useful, especially when the number of training examples is small. • We have tested four well-known classifiers using PCA for feature extraction. • Genetic subset feature selection has led to lower error rates in all cases.
Future Work • Generalize feature encoding scheme. • Use weights instead of 0/1 encoding. • Consider more powerful fitness functions. • Use larger data sets. • FERET data set. • Apply feature selection using different features. • Various features (e.g., Wavelet or Gabor features) • Experiment with different data sets. • Different data sets (e.g., vehicle detection)