160 likes | 351 Views
Discriminative Frequent Pattern Analysis for Effective Classification. Hong Cheng, Xifeng Yan, Jiawei Han and Chih-Wei Hsu ICDE 2007. Outline. Introduction The framework of Frequent Pattern-based Classification Experimental Results Conclusion. Introduction.
E N D
Discriminative Frequent Pattern Analysis for Effective Classification Hong Cheng, Xifeng Yan, Jiawei Han and Chih-Wei Hsu ICDE 2007
Outline • Introduction • The framework of Frequent Pattern-based Classification • Experimental Results • Conclusion
Introduction • The use of frequent patterns without feature selection will result in a huge feature space. • This might slow down the model learning process. • The classification accuracy deteriorates. • An effective and efficient feature selection algorithm is proposed to select a set of frequent and discriminative patterns for classification.
Frequent Pattern vs. Single Feature • The discriminative power of some frequent patterns is higher than that of single features. (a) Austral (b) Cleve (c) Sonar Fig. 1. Information Gain vs. Pattern Length
The Framework of Frequent Pattern-based Classification • It includes three steps: • Feature generation • Feature selection • Model learning
Discriminative Power v.s. Pattern frequency • This paper demonstrates that the discriminative power of low-support features is limited. • The low-support features could harm the classification accuracy due to overfitting.
Cont. • The discriminative power of a pattern is closely related to its support For a pattern represented by a random variable X, Given a DB with a fixed class distribution, H(C) is a constant. IGub(C|X) is closely related to If H(C|X) reaches its lower bound when q=0 or 1 Therefore, the discriminative power of low frequency patterns is bounded by a small value.
Empirical Results (b) Breast (c) Sonar (a) Austral Fig. 2. Information Gain vs. Pattern Frequency
Set min_sup • A subset of high quality features are selected for classification,with • Because , features with support can be skipped. • The major steps: • Compute • Choose • Find • Mine frequent patterns with
Feature Selection • Given a set of frequent patterns, both non-discriminative and redundant patterns exist. • We want to single out the discriminative patterns and remove redundant ones • The notion of Maximal Marginal Relevance (MMR) is borrowed
Conclusion • An Effective and efficient feature selection algorithm is proposed to select a set of frequent and discriminative patterns for classification. • Scalability issue • It is computationally infeasible to generate all feature combinations and filter them with an information gain threshold • Efficient method (DDPMine: FPtree pruning): H. Cheng, X. Yan, J. Han, and P. S. Yu, "Direct Discriminative Pattern Mining for Effective Classification", ICDE'08.