130 likes | 271 Views
Consensus Group Stable Feature Selection. Steven Loscalzo Dept. of Computer Science Binghamton University. Lei Yu Dept. of Computer Science Binghamton University. Chris Ding Dept. of Computer Science and Engineering University of Texas at Arlington.
E N D
Consensus GroupStable Feature Selection Steven Loscalzo Dept. of Computer Science Binghamton University Lei Yu Dept. of Computer Science Binghamton University Chris Ding Dept. of Computer Science and Engineering University of Texas at Arlington The 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Overview • Background and motivation • Propose Consensus Feature Group Framework • Finding Consensus Groups • Feature Selection from Consensus Groups • Experimental Study • Conclusion Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
Feature Selection Stability Sampling ModelBuilding Acc % Feature Selection Sample 1 All Training Data F={f2,f5} 92% Sample 2 F’={f4,f10} 91% … Sample k F’’={f5, f11} 93% Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
Motivation • Need for stable feature selection • Give confidence to lab tests • Uncover “truly” relevant information • Utility of feature groups • Model feature interaction • Lack information about a single feature, another in the group may be well studied Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
Dense Feature Group Framework • Dense feature groups can provide stability and accuracy [Yu, Ding, Loscalzo, KDD-08] • Dense Group Stable Feature Selection Framework • Map features as points in sample space • Apply kernel density estimation locate dense feature groups • Select top relevant groups from dense groups • Limitations of this framework • Unreliable density estimation in high-dimensional spaces • Restricts selection of relevant groups to dense groups Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
Consensus Feature Group Framework • Consensus feature groups are ensemble of feature grouping results • Select relevant groups from whole spectrum of consensus groups • Challenges • Base algorithm for ensemble: dense group finder [Yu, Ding, Loscalzo, KDD-08] • Aggregate feature grouping results Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
Group Aggregation Data sub-sample Feature Group Results • 3 aggregation ideas: • Heuristics (reference set) • Cluster based [Fern, Brodley, ICML-03] • Instance based [Fern, Brodley, ICML-03] 1 1 f5 f1 f2 f3 f4 f2 2 2 f4 f5 f1 f3 f2 f1 3 3 f5 f3 f4 f4 f5 Consensus Feature Groups f2 f3 f1 Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
D D1 Dt … Result Grouping 1 Result Grouping t ... ... Measure Instance Co-occurrence Hierarchical Clustering Consensus Feature Groups ... The CGS Algorithm CGS: The Consensus Group Stable Feature Selection Algorithm fori = 1 totdo Construct Training Partition Di from D Run DGF on Di for every pair of features Xiand Xj in D Update Wi,j := freq. Xi and Xj appear together in results create consensus groups CG1,CG2,…,CGL via hierarchical clustering of all features based on Wi,j for i = 1 toL do Obtain a representative feature Xi from CGi Measure relevance of Xi set as relevance of CGi Rank CG1,CG2,…,CGLand return the top k Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
Used 10 random shuffles of data: 10 fold cross validation 9/10 folds training 1/10 folds testing Results shown are averages across 10 folds x 10 shuffles Experimental Setup Setting Algorithms CGS – sub-samples t = 10 DRAGS[Yu, Ding, Loscalzo, KDD-08] – top dense group based feature selection SVM-RFE [Guyon et al, ML-02] – recursively eliminates features based on weights found after training an SVM Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
StabilitySelected Features StabilitySelected Groups Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
Accuracy Results Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
Conclusion • Proposed consensus group stable feature selection framework • Stable • Accurate • Future directions • Apply different ensemble techniques • Incorporate new group finding algorithms Loscalzo, Yu, Ding Consensus Group Stable Feature Selection
References Fern, X. Z., and Brodley, C. Random projection for high-dimensional data clustering: a cluster ensemble approach. In Proceedings of the 20th Conference on Machine Learning (ICML-03). 186-192, 2003. Guyon, I., Weston, J., Barnhill, S., Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning (ML-02);46:389–422, 2002. Yu, L., Ding, C., and Loscalzo, S. Stable feature selection via dense feature groups. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD-08). 803-811, 2008. Loscalzo, Yu, Ding Consensus Group Stable Feature Selection