200 likes | 323 Views
The University of British Columbia Department of Electrical & Computer Engineering. Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems. Presented by. Faten Hussein. Outline. Introduction & Problem Definition Motivation & Objectives
E N D
The University of British Columbia Department of Electrical & Computer Engineering Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Presented by Faten Hussein
Outline • Introduction & Problem Definition • Motivation & Objectives • System Overview • Results • Conclusions
Introduction Off-line Character Recognition System Text document • Address readers • Bank Cheques readers • Reading data entered in forms (tax forms) • Detecting forged signatures Scanning Pre-Processing Feature Extraction Classification Classified text Post-Processing
Introduction For typical handwritten recognition task: • Many variants of character (symbol) shape, size. • Different writers have different writing styles. • Same person could have different writing style. • Thus, unlimited number of variations for a single character exists.
Introduction Variations in handwritten digits extracted from zip codes L=0, E=3 To overcome this diversity, a large number of features must be added L=1, E=1 L=2, E=0 An example of features that we used are: moment invariants, number of loops, number of end points, centroid, area, circularity and so on.
Problem Dilemma Add more features • Increase problem size • Increase run time/memory for classification • To accommodate variations in symbols • Add-hoc process, depends on experience and trail and error Character Recognition System • Might add redundant/irrelevant features which decrease the accuracy • Hope to increase classification accuracy
Feature Selection Solution:Feature Selection Definition:Select a relevant subset of features from a larger set of features while maintaining or enhancing accuracy Advantages • Remove irrelevant and redundant features • Total of 40 features -> reduced to 16 • 7 Hu moments -> only first three • Area removed -> redundant (Circularity) • Maintain/enhance the classification accuracy 70% recognition rate using 40 features -> 75% after FS & using only 16features • Faster classification and less memory requirements
Feature Selection/Weighting • The process of assigning weights (binary or real valued) to features needs a search algorithm to search for the set of weights that results in best classification accuracy (optimization problem) • Genetic algorithm is a good search method for optimization problems
Genetic Feature Selection/Weighting Why use GA for FS/FW • Has been proven to be a powerful search method for FS problem • Does not require derivative information or any extra knowledge; only the objective function (classifier’s error rate) to evaluate the quality of the feature subset • Search a population of solutions in parallel, so they can provide a number of potential solutions not only one • GA is resistant to becoming trapped in local minima
Objectives & Motivations Build a genetic feature selection/weighting system to be applied to character recognition problem and investigate the following issues: • Study the effect of varying weight values on the number of selected features (FS often eliminates more features than FW, how much ??) • Compare the performance of genetic feature selection/weighting in the presence of irrelevant & redundant features (not studied before) • Compare the performance of genetic feature selection/weighting for regular cases (test the hypothesis that says that FW should have better or at least same results as FS ??) • Evaluate the performance of the better method (GFS or GFW) in terms of optimality and time complexity (study the feasibility of genetic search for optimality & time)
Methodology • The recognition problem is to classify isolated handwritten digits • Used k-nearest-neighbor as a classifier (k=1) • Used genetic algorithm as search method • Applied genetic feature selection and weighting in the wrapper approach (i.e. fitness function is the classifier’s error rate) • Used two phases during the program run: training/testing phase and validation phase
System Overview Best feature subset (M <N) Pre-Processing Module Feature Extraction Module All Extracted features N Feature selection/weighting Module (GA) Input (isolated handwritten digits images) Clean images Assessment of feature subset Feature subset Evaluation Module (KNN classifier) Training/Testing Evaluation Validation
Results (Comparison 1) Effect of varying weight values on the number of selected features • As the number of weight values increase, the probability of a feature having weight value=0 (POZ) decreases, so the number of eliminated features decreases • GFS eliminates more features (thus selects less features) than GFW because of its smaller number of weight values (0/1) and without compromising classification accuracy
Performance of genetic feature selection/weighting in the presence of irrelevant features Results (Comparison 2) • The performance of 1-NN classifier rapidly degrades by increasing the number of irrelevant features • As the number of irrelevant features increases, FS outperform all FW settings in both classification accuracy and elimination of features
Performance of genetic feature selection/weighting in the presence of redundant features Results (Comparison 3) • The classification accuracy of 1-NN does not suffer so much by adding redundant features, but they increase the problem size • As the number of redundant features increases, FS has slightly better classification accuracy than all FW settings, but significantly outperform FW in elimination of features
Performance of genetic feature selection/weighting for regular cases (not necessarily having irrelevant/redundant) Results (Comparison 4) • FW has better training accuracies than FS, but FS is better in generalization (have better accuracies for unseen validation samples) • FW over-fits the training samples
Results (Evaluation 1) Convergence of GFS to an Optimal or Near-Optimal Set of Features • GFS was able to return optimal or near-optimal values (reached by the exhaustive search) • The worst average value obtained by GFS less than 1% away from optimal value
Number of Features Best Exh. (opt. & near-opt.) Exhaustive Run Time Best GA Average GA (for 5 runs) Number of Generations GA Run Time (single run) 8 74, 73.8 2 minutes 74 73.68 5 2 minutes 10 75.2, 75 13 minutes 75.2 74.96 5 3 minutes 12 77.2, 77 47 minutes 77 76.92 10 5 minutes 14 79, 78.8 3 hours 79 78.2 10 5.5 minutes 16 79.2, 79 6 hours 79.2 78.48 15 8 minutes 18 79.4, 79.2 1.5 days 79.4 78.92 20 11 minutes Convergence of GFS to an Optimal or Near-Optimal Set of Features within an Acceptable Number of Generations Results (Evaluation 2) The time needed for GFS is bounded by (lower) linear-fit and (upper) exponential-fit curves The use of GFS for highly dimensional problems need parallel processing
Conclusions • GFS is superior to GFW in feature reduction and without compromising classification accuracy • In the presence of irrelevant features, GFS is better than GFW in both feature reduction and classification accuracy • In the presence of redundant features, GFS is also preferred over GFW due its increased ability to feature reduction • For regular databases, it is advisable to use 2 or 3 weight values at most to avoid over-fitting • GFS is a reliable method to find optimal or near-optimal solution, but need parallel processing for large problem sizes