120 likes | 339 Views
Forward Semi-Supervised Feature Selection. Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu. Feature Selection. Challenges of high dimension data Dimensional curse Noise Objective of feature selection Improving the performance of the predictors
E N D
Forward Semi-Supervised Feature Selection Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu
Feature Selection • Challenges of high dimension data • Dimensional curse • Noise • Objective of feature selection • Improving the performance of the predictors • Providing more cost-effective predictors • Better understanding of the underlying process that generated the data
Supervised / unsupervised learning • Supervised learning • Used “labeled data” only • Unsupervised learning • Used “unlabeled data” only
Challenges of traditional feature selection methods • A lot of supervised learning methods • Lack of labeled data • The class labels are obtained manually • The class labels are expensive to obtained • Data bias • Challenges: • The training dataset cannot reflect the distribution of the real data in some time. • The model constructed on training set may be not suitable for the unseen data
Abundance of the unlabeled data • Easy to obtain • Don’t need the manually-labeled information • Can reflect the distribution of the real data
Then… How to used unlabeled data effectively?
Forward Semi-Supervised Feature Selection • Basic idea • Random selection from unlabeled data with predicted labels • Form new training set • Feature selection on new training set • Perform several iterations • Add the most frequent one to the result feature subset
Forward Semi-Supervised Feature Selection Iterations Unlabeled data with predicted labels Random selection New training set Train the Classifier and Prediction Iterations SFFS Form the new Feature subset Select the best features Select the most frequent one feature subset
Experiment • Datasets • UCI • Classifiers • NaiveBayes, NNge, and k-NN • Comparison • FULL, SFFS and SLS Z. Zhao and H. Liu. ``Semi-supervised Feature Selection via Spectral Analysis", SIAM International Conference on Data Mining (SDM-07), April 26-28, 2007, Minneapolis, Minnesoda ------------------ SLS
Conclusion • The proposed algorithm works in an iterative procedure; • Unlabeled examples receive labels from the classifier constructed on currently selected feature subset; • Form joint dataset with labeled and randomly selected unlabeled data with predicted labels; • Experiment results show that the proposed approach, can obtained higher accuracy than other supervised and semi-supervised feature selection algorithms in sometime.