160 likes | 262 Views
Extending Propositional Satisfiability to Determine Minimal Fuzzy-Rough Reducts. Outline. The importance of feature selection Rough set theory Fuzzy-rough feature selection (FRFS) FRFS-SAT Experimentation Conclusion. Feature selection. Why dimensionality reduction/feature selection?
E N D
Extending Propositional Satisfiability to Determine Minimal Fuzzy-Rough Reducts
Outline • The importance of feature selection • Rough set theory • Fuzzy-rough feature selection (FRFS) • FRFS-SAT • Experimentation • Conclusion
Feature selection • Why dimensionality reduction/feature selection? • Growth of information - need to manage this effectively • Curse of dimensionality - a problem for machine learning Intractable High dimensional data Dimensionality Low dimensional Reduction data Processing System
Rough set theory Upper Approximation Set A Lower Approximation Equivalence class Rx Rx is the set of all points that are indiscernible with point x in terms of feature subset B
Discernibility approach • Decision-relative discernibility matrix • Compare objects • Examine attribute values • For attributes that differ: • If decision values differ, include attributes in matrix • Else leave slot blank • Construct discernibility function:
Example • Remove duplicates fC(a,b,c,d) = {a ⋁ b ⋁ c ⋁ d} ⋀ {a ⋁ c ⋁ d} ⋀ {b ⋁ c} ⋀ {d} ⋀ {a ⋁ b ⋁ c} ⋀ {a ⋁ b ⋁ d} ⋀ {b ⋁ c ⋁ d} ⋀ {a ⋁ d} • Remove supersets fC(a,b,c,d) = {b ⋁ c} ⋀ {d}
Finding reducts • Usually too expensive to search exhaustively for reducts with minimal cardinality • Reducts found through: • Converting from CNF to DNF (expensive) • Hill-climbing search using clauses (non-optimal) • Other search methods - GAs etc (non-optimal) • RSAR-SAT • Solve directly in SAT formulation. • DPLL approach is both fast and ensures optimal reducts
Fuzzy discernibility matrices • Extension of crisp approach • Previously, attributes had {0,1} membership to clauses • Now have membership in [0,1] • Allows real-coded data as well as nominal. • Fuzzy DMs can be used to find fuzzy-rough reducts
Formulation • Fuzzy satisfiability • In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true • For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true
Experimentation: setup • 9 benchmark datasets • Features – 10 to 39 • Objects – 120 to 690 • Methods used: • FRFS-SAT • Greedy hill-climbing: fuzzy dependency, fuzzy boundary region and fuzzy discernibility. • Evolutionary algorithms: genetic algorithms (GA) and particle swarm optimization (PSO) using fuzzy dependency • 10x10-fold cross validation • FS performed on the training folds, test folds reduced using discovered reducts
Conclusion • Extended propositional satisfiability to enable search for fuzzy-rough reducts • New framework for fuzzy satisfiability • New DPLL algorithm • Fuzzy clause simplification • Future work: • Non-chronological backtracking • Better heuristics • Unsupervised FS • Other extensions in propositional satisfiability
WEKA implementations of all fuzzy-rough feature selectors and classifiers can be downloaded from: • http://users.aber.ac.uk/rkj/book/weka.zip
Feature selection • Feature selection (FS) is a DR technique that preserves data semantics (meaning of data) • Subset generation: forwards, backwards, random… • Evaluation function: determines ‘goodness’ of subsets • Stopping criterion: decide when to stop subset search Feature set Subset Evaluation Generation Subset suitability Stopping Continue Stop Validation Criterion