700 likes | 971 Views
Fuzzy-rough data mining. Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk http://users.aber.ac.uk/rkj. Outline. Knowledge discovery process Fuzzy-rough methods Feature selection and extensions Instance selection Classification/prediction
E N D
Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk http://users.aber.ac.uk/rkj
Outline • Knowledge discovery process • Fuzzy-rough methods • Feature selection and extensions • Instance selection • Classification/prediction • Semi-supervised learning
Knowledge discovery • The process • The problem of too much data • Requires storage • Intractable for data mining algorithms • Noisy or irrelevant data is misleading/confounding
Feature selection • Why dimensionality reduction/feature selection? • Growth of information - need to manage this effectively • Curse of dimensionality - a problem for machine learning and data mining • Data visualisation - graphing data Intractable High dimensional data Dimensionality Low dimensional Reduction data Processing System
Why do it? • Case 1: We’re interested in features • We want to know which are relevant • If we fit a model, it should be interpretable • Case 2: We’re interested in prediction • Features are not interesting in themselves • We just want to build a good classifier (or other kind of predictor)
Feature selection process • Feature selection (FS) preserves data semantics by selecting rather than transforming • Subset generation: forwards, backwards, random… • Evaluation function: determines ‘goodness’ of subsets • Stopping criterion: decide when to stop subset search Feature set Subset Evaluation Generation Subset suitability Stopping Continue Stop Validation Criterion
Fuzzy-rough set theory • Problems: • Rough set methods (usually) require data discretization beforehand • Extensions, e.g. tolerance rough sets, require thresholds • Also no flexibility in approximations • E.g. objects either belong fully to the lower (or upper) approximation, or not at all
Fuzzy-rough sets Rough set t-norm Fuzzy-rough set implicator
Fuzzy-rough feature selection • Based on fuzzy similarity • Lower/upper approximations (e.g.)
FRFS: evaluation function • Fuzzy positive region #1 • Fuzzy positive region #2 (weak) • Dependency function
FRFS: finding reducts • Fuzzy-rough QuickReduct • Evaluation: use the dependency function (or other fuzzy-rough measure) • Generation: greedy hill-climbing • Stopping criterion: when maximal evaluation function is reached (or to degree α)
FRFS • Other search methods • GAs, PSO, EDAs, Harmony Search, etc • Backward elimination, plus-L minus-R, floating search, SAT, etc • Other subset evaluations • Fuzzy boundary region • Fuzzy entropy • Fuzzy discernibility function
Boundary region Upper Approximation Set X Lower Approximation Equivalence class [x]B
FRFS: boundary region • Fuzzy lower and upper approximation define fuzzy boundary region • For each concept, minimise the boundary region • (also applicable to crisp RSFS) • Results seem to show this is a more informed heuristic (but more computationally complex)
Finding smallest reducts • Usually too expensive to search exhaustively for reducts with minimal cardinality • Reducts found via discernibility matrices through, e.g.: • Converting from CNF to DNF (expensive) • Hill-climbing search using clauses (non-optimal) • Other search methods - GAs etc (non-optimal) • SAT approach • Solve directly in SAT formulation • DPLL approach ensures optimal reducts
Fuzzy discernibility matrices • Extension of crisp approach • Previously, attributes had {0,1} membership to clauses • Now have membership in [0,1] • Fuzzy DMs can be used to find fuzzy-rough reducts
Formulation • Fuzzy satisfiability • In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true • For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true
FRFS: issues • Problem – noise tolerance!
Vaguely quantified rough sets y belongs to the lower approximation of A iff allelements of Ry belong to A y belongs to the upper approximation of A iffat least oneelement of Ry belongs to A Pawlakrough set y belongs to the lower approximation of A iffmostelements of Ry belong to A y belongs to the upper approximation of A iffat least someelements of Ry belong to A VQRS
VQRS-based feature selection • Use the quantified lower approximation, positive region and dependency degree • Evaluation: the quantified dependency (can be crisp or fuzzy) • Generation: greedy hill-climbing • Stopping criterion: when the quantified positive region is maximal (or to degree α) • Should be more noise-tolerant, but is non-monotonic
Progress Qualitative data Rough set theory Quantitative data Fuzzy rough set theory ... Noisy data VQRS Fuzzy VPRS Monotonic OWA-FRFS
More issues... • Problem #1: how to choose fuzzy similarity? • Problem #2: how to handle missing values?
Interval-valued FRFS IV fuzzy rough set • Answer #1: Model uncertainty in fuzzy similarity by interval-valued similarity IV fuzzy similarity
Interval-valued FRFS • When comparing two object values for a given attribute – what to do if at least one is missing? • Answer #2: Model missing values via the unit interval
Other measures • Boundary region • Discernibility function
Initial experimentation Original Dataset Cross-validation folds Type-1 FRFS Data corruption IV-FRFS methods Reduced folds Reduced folds JRip JRip
Instance selection: basic ideas Not needed Remove objects to keep the underlying approximations unchanged
Instance selection: basic ideas Noisy objects Remove objects whose positive region membership is < 1
Fuzzy rough instance selection • Time complexity is a problem for FRIS-II and FRIS-III • Less complex: Fuzzy rough prototype selection • More on this later...
Further developments • FRNN and VQNN have limitations (for classification problems) • FRNN only uses one neighbour • VQNN equivalent to FNN if the same similarity relation is used • POSNN uses the positive region to also consider the quality of neighbours • E.g. instances in overlapping class regions are less interesting • More on this later...
Discovering rules via RST • Equivalence classes • Form the antecedent part of a rule • The lower approximation tells us if this is predictive of a given concept (certain rules) • Typically done in one of two ways: • Overlaying reducts • Building rules by considering individual equivalence classes (e.g. LEM2)
QuickRules framework • The fuzzy tolerance classes used during this process can be used to create fuzzy rules • When a reduct is found the resulting rules cover all instances Feature set Subset Evaluation and Generation Rule Induction Subset suitability Stopping Continue Stop Validation Criterion
Harmony search approach • R. Diao and Q. Shen. A harmony search based approach to hybrid fuzzy-rough rule induction, Proceedings of the 21st International Conference on Fuzzy Systems, 2012.
Harmony search approach Musicians Harmony Fitness Notes HarmonyMemory Minimise ( a – 2 ) 2 + ( b – 3 ) 4 + ( c – 1 ) 2 + 3
Key notion mapping HarmonySearch Hybrid RuleInduction NumericalOptimisation Musician Fuzzy rule rx Variable Note Feature subset Value Harmony Rule set Solution Fitness Combined evaluation Evaluation