820 likes | 974 Views
Feature Selection. Jamshid Shanbehzadeh, Samaneh Yazdani. Department of Computer Engineering, Faculty Of Engineering, Khorazmi University (Tarbiat Moallem University of Teheran). Outline. Outline. Part 1: Dimension Reduction Dimension Feature Space Definition & Goals
E N D
Feature Selection Jamshid Shanbehzadeh, Samaneh Yazdani Department of Computer Engineering, Faculty Of Engineering, Khorazmi University (Tarbiat Moallem University of Teheran)
Outline • Part 1: Dimension Reduction • Dimension • Feature Space • Definition & Goals • Curse of dimensionality • Research and Application • Grouping of dimension reduction methods • Part 3: Application Of Feature Selection and Software • Part 2: Feature selection • Parts of feature set • Feature Selection Approach
Part 1: Dimension Reduction
Dimension Reduction • Dimension • Dimension (Feature or Variable): • A measurement of a certain aspect of an object • Two feature of person: • weight • hight
Dimension Reduction • Feature Space • Feature Space: • An abstract space where each pattern sample is represented as point
Dimension Reduction • Introduction • Large and high-dimensional data • Web documents, etc… • A large amount of resources are needed in • Information Retrieval • Classification tasks • Data Preservation etc… Dimension Reduction
Dimension Reduction • Definition & Goals • Dimensionality reduction: • The study of methods for reducing the number of dimensions describing the object • General objectives of dimensionality reduction: • Reduce the computational cost • Improve the quality of data for efficient data-intensive processing tasks
Dimension Reduction • Definition & Goals Class 1: overweight Class 2: underweight Weight (kg) 60 50 • Dimension Reduction • preserves information on classification of overweight and underweight as much as possible • makes classification easier • reduces data size ( 2 features 1 feature ) Height (cm) 150 140
Dimension Reduction • Curse of dimensionality • As the number of dimension increases, a fix data sample becomes exponentially spars Example: Observe that the data become more and more sparse in higher dimensions • Effective solution to the problem of “curse of dimensionality” is: • Dimensionality reduction
Dimension Reduction • Research and Application Why dimension reduction is a subject of much research recently? • Massive data of large dimensionality in: • Knowledge discovery • Text mining • Web mining • and . . .
Dimension Reduction • Grouping of dimension reduction methods • Dimensionality reduction approaches include • Feature Selection • Feature Extraction
Dimension Reduction • Grouping of dimension reduction methods : Feature Selection • Dimensionality reduction approaches include • Feature Selection: the problem of choosing a small subset of features that ideally are necessary and sufficient to describe the target concept. Example • Feature Set= {X,Y} • Two Class Goal: Classification • Feature X Or Feature Y ? • Answer: Feature X
Dimension Reduction • Grouping of dimension reduction methods : Feature Selection • Feature Selection (FS) • Selects feature • ex. • Preserves weight
Dimension Reduction • Grouping of dimension reduction methods • Dimensionality reduction approaches include • Feature Extraction: Create new feature based on transformations or combinations of the original feature set. New Feature • Original Feature {X1,X2}
Dimension Reduction • Grouping of dimension reduction methods • Feature Extraction (FE) • Generates feature • ex. • Preserves weight / height
Dimension Reduction • Grouping of dimension reduction methods • Dimensionality reduction approaches include • Feature Extraction: Create new feature based on transformations or combinations of the original feature set. • N: Number of original features • M: Number of extracted features • M<N
Dimension Reduction • Question: Feature Selection Or Feature Extraction • Feature Selection Or Feature Extraction • It is depend on the problem. Example • Pattern recognition: problem of dimensionality reduction is to extract a small set of features that recovers most of the variability of the data. • Text mining: problem is defined as selecting a small subset of words or terms (not new features that are combination of words or terms). • Image Compression: problem is finding the best extracted features to describe the image
Part 2: Feature selection
Feature selection • Thousands to millions of low level features: select the most relevant one to build better, faster, and easier to understand learning machines. n X m N
Feature selection • Parts of feature set • Irrelevant OR Relevant • Three disjoint categories of features: • Irrelevant • Weakly Relevant • Strongly Relevant
Feature selection • Parts of feature set • Irrelevant OR Relevant • Goal: Classification • Two Class : {Lion and Deer} • We use some features to classify a new instance To which class does this animal belong
Feature selection • Parts of feature set • Irrelevant OR Relevant • Goal: Classification • Two Class : {Lion and Deer} • We use some feature to classify a new instance So, number of legs is irrelevant feature Q: Number of legs? A: 4 • Feature 1: Number of legs
Feature selection • Parts of feature set • Irrelevant OR Relevant • Goal: Classification • Two Class : {Lion and Deer} • We use some features to classify a new instance So, Color is an irrelevant feature Q: What is its color? A: • Feature 1: Number of legs • Feature 2: Color
Feature selection • Parts of feature set • Irrelevant OR Relevant • Goal: Classification • Two Class : {Lion and Deer} • We use some features to classify a new instance So, Feature 3 is a relevant feature Q: What does it eat? A: Grass • Feature 1: Number of legs • Feature 2: Color • Feature 3: Type of food
Feature selection • Parts of feature set • Irrelevant OR Relevant • Goal: Classification • Three Class : {Lion, Deer and Leopard} • We use some features to classify a new instance To which class does this animal belong
Feature selection • Parts of feature set • Irrelevant OR Relevant • Goal: Classification • Three Class : {Lion, Deer and Leopard} • We use some features to classify a new instance So, number of legs is an irrelevant feature Q: Number of legs? A: 4 • Feature 1: Number of legs
Feature selection • Parts of feature set • Irrelevant OR Relevant • Goal: Classification • Three Class : {Lion, Deer and Leopard} • We use some features to classify a new instance So, Color is a relevant feature Q: What is its color? A: • Feature 1: Number of legs • Feature 2: Color
Feature selection • Parts of feature set • Irrelevant OR Relevant • Goal: Classification • Three Class : {Lion and Deer and Leopard} • We use some features to classify a new instance So, Feature 3 is a relevant feature Q: What does it eat? A: meat • Feature 1: Number of legs • Feature 2: Color • Feature3: Type of food
Feature selection • Parts of feature set • Irrelevant OR Relevant • Goal: Classification • Three Class : {Lion and Deer and Leopard} • We use some feature to classify a new instance • Feature 1: Number of legs • Feature 2: Color • Feature3: Type of food • Add new feature: Felidae • It is weakly relevant feature • Optimal set: {Color, Type of food} Or {Color, Felidae}
Feature selection • Parts of feature set • Irrelevant OR Relevant • Traditionally, feature selection research has focused on searching for relevant features. Relevant Irrelevant Feature set
Data set Five Boolean features C = F1∨F2 F3 = ┐F2 ,F5 = ┐F4 Optimal subset: {F1, F2}or{F1, F3} • Feature selection • Parts of feature set • Irrelevant OR Relevant: An Example for the Problem
Feature selection • Parts of feature set • Irrelevant OR Relevant • Formal Definition 1 (Irrelevance) : • Irrelevance indicates that the feature is not necessary at all. • In previous Example: • F4, F5 irrelevance Relevant F4 and F5
Feature selection • Parts of feature set • Irrelevant OR Relevant • Definition1(Irrelevance)A feature Fi is irrelevantif • Irrelevance indicates that the feature is not necessary at all • F be a full set of features • Fia feature • Si= F −{Fi}.
Feature selection • Parts of feature set • Irrelevant OR Relevant • Categories of relevant features: • Strongly Relevant • Weakly Relevant Strongly Irrelevant Weakly Relevant
Data set Five Boolean features C = F1∨F2 F3 = ┐F2 ,F5 = ┐F4 • Feature selection • Parts of feature set • Irrelevant OR Relevant: An Example for the Problem
Feature selection • Parts of feature set • Irrelevant OR Relevant • Formal Definition2 (Strong relevance) : • Strong relevance of a feature indicates that the feature is always necessary for an optimal subset • It cannot be removed without affecting the original conditional class distribution. • In previous Example: • Feature F1 is strongly relevant Weakly F1 F4 and F5
Feature selection • Parts of feature set • Irrelevant OR Relevant • Definition 2 (Strong relevance)A feature Fi is strongly relevant if • Strong relevance of a feature cannot be removed without affecting the original conditional class distribution
Feature selection • Parts of feature set • Irrelevant OR Relevant • Formal Definition 3 (Weak relevance) : • Weak relevance suggests that the feature is not always necessary but may become necessary for an optimal subset at certain conditions. • In previous Example: • F2, F3 weakly relevant F2 and F3 F1 F4 and F5
Feature selection • Parts of feature set • Irrelevant OR Relevant • Definition 3 (Weak relevance)A feature Fi is weakly relevantif • Weak relevance suggests that the feature is not always necessary but may become necessary for an optimal subset at certain conditions.
Feature selection • Parts of feature set • Optimal Feature Subset • Example: • In order to determine the target concept (C=g(F1, F2)): • F1 is indispensable • One of F2 and F3 can be disposed • Both F4 and F5 can be discarded. optimal subset: Either {F1, F2} or{F1, F3} • The goalof feature selection is to find either of them.
Feature selection • Parts of feature set • Optimal Feature Subset optimal subset: Either {F1, F2} or{F1, F3} • Conclusion • An optimal subset should include all strongly relevant features, none of irrelevant features, and a subset of weakly relevant features. which of weakly relevant features should be selected and which of them removed
Feature selection • Parts of feature set • Redundancy • Solution • Defining Feature Redundancy
Feature selection • Parts of feature set • Redundancy • Redundancy • It is widely accepted that two features are redundant to each other if their values are completely correlated • In previous Example: • F2, F3 ( )
Feature selection • Parts of feature set • Redundancy • It used when one feature is correlated with a set of features. • Given a feature Fi, let ,Mi is said to be a Markov blanket for Fi if Markov blanket • The Markov blanket condition requires that Mi subsume not only the information that Fihas about C, but also about all of the other features.
Feature selection • Parts of feature set • Redundancy • Redundancy definition further divides weakly relevant features into redundant and non-redundant ones. II III Strongly Irrelevant Weakly II : Weakly relevant and redundant features III: Weakly relevant but non-redundant features Optimal Subset: Strongly relevant features +Weakly relevant but non-redundant features
Feature selection • Approaches
Feature selection • Approaches : Subset Evaluation (Feature Subset Selection ) • Framework of feature selection via subset evaluation
Feature selection • Approaches : Subset Evaluation (Feature Subset Selection ) • Generates subset of features for evaluation • Can start with: • no features • all features • random subset of features Subset Generation 1 2 Original Feature Set Generation Subset Evaluation Goodness of the subset Stopping Criterion No Validation Yes 3 4
Feature selection • Approaches : Subset Evaluation (Feature Subset Selection ) Subset search method-Exhaustive Search Example • Examine all combinations of feature subset. • Example: • {f1,f2,f3} => { {f1},{f2},{f3},{f1,f2},{f1,f3},{f2,f3},{f1,f2,f3} } • Order of the search space O(2d), d - # feature. • Optimal subset is achievable. • Too expensive if feature space is large.