Forward Semi-Supervised Feature Selection

Forward Semi-Supervised Feature Selection Jiangtao Ren, Zhengyuan Qiu, Wei Fan, Hong Cheng, and Philip S. Yu

Feature Selection • Challenges of high dimension data • Dimensional curse • Noise • Objective of feature selection • Improving the performance of the predictors • Providing more cost-effective predictors • Better understanding of the underlying process that generated the data

Supervised / unsupervised learning • Supervised learning • Used “labeled data” only • Unsupervised learning • Used “unlabeled data” only

Challenges of traditional feature selection methods • A lot of supervised learning methods • Lack of labeled data • The class labels are obtained manually • The class labels are expensive to obtained • Data bias • Challenges: • The training dataset cannot reflect the distribution of the real data in some time. • The model constructed on training set may be not suitable for the unseen data

Abundance of the unlabeled data • Easy to obtain • Don’t need the manually-labeled information • Can reflect the distribution of the real data

Then… How to used unlabeled data effectively?

Forward Semi-Supervised Feature Selection • Basic idea • Random selection from unlabeled data with predicted labels • Form new training set • Feature selection on new training set • Perform several iterations • Add the most frequent one to the result feature subset

Forward Semi-Supervised Feature Selection Iterations Unlabeled data with predicted labels Random selection New training set Train the Classifier and Prediction Iterations SFFS Form the new Feature subset Select the best features Select the most frequent one feature subset

Forward semi-supervised feature selection

Experiment • Datasets • UCI • Classifiers • NaiveBayes, NNge, and k-NN • Comparison • FULL, SFFS and SLS Z. Zhao and H. Liu. ``Semi-supervised Feature Selection via Spectral Analysis", SIAM International Conference on Data Mining (SDM-07), April 26-28, 2007, Minneapolis, Minnesoda ------------------ SLS

Empirical Results

Conclusion • The proposed algorithm works in an iterative procedure; • Unlabeled examples receive labels from the classifier constructed on currently selected feature subset; • Form joint dataset with labeled and randomly selected unlabeled data with predicted labels; • Experiment results show that the proposed approach, can obtained higher accuracy than other supervised and semi-supervised feature selection algorithms in sometime.

Forward Semi-Supervised Feature Selection

Forward Semi-Supervised Feature Selection

Presentation Transcript

Semi-supervised Learning

Semi-Supervised Learning

Semi-Supervised Learning

Semi-supervised learning

Semi-Supervised Learning

Semi-supervised learning

Feature selection

Semi-Supervised Learning

Semi-supervised Learning

Semi-Supervised Feature Selection for Graph Classification

Semi-Supervised Clustering

Feature Selection

Semi-Supervised Learning

Semi-Supervised Clustering

Semi-Supervised Learning

Semi-supervised Learning

Feature Selection

Semi-Supervised Learning

Feature selection

Semi-Supervised Clustering

Semi-Supervised Learning