Semi-Supervised Learning

Semi-Supervised Learning • Can we improve the quality of our learning by combining labeled and unlabeled data • Usually a lot more unlabeled data available than labeled • Assume a set L of labeled data andUof unlabeled data (from the same distribution) • Focus on Semi-Supervised Classification though there are many other variations • Aiding clustering with some labeled data • Regression • Model selection with unlabeled data (COD) • Transduction vs Induction CS 678 - Ensembles and Bayes

How Semi-Supervised Works • Approaches make strong model assumptions (guesses). If wrong can make things worse. • Some common used assumptions • Clusters of data are from the same class • Data can be represented as a mixture of parameterized distributions • Decision boundaries should go through non-dense areas of the data • Model should be as simple as possible (Occam) CS 678 - Ensembles and Bayes

Unsupervised Learning of Domain Features • PCA, SVD • NLDR – Non-Linear Dimensionality Reduction • Deep Learning • Deep Belief Nets • Sparse Auto-encoders • Self-Taught Learning CS 678 - Ensembles and Bayes

Self-Training (Bootstrap) • Self-Training • Train supervised model on labeled data L • Test on unlabeled data U • Add the most confidently classified members of U to L • Repeat • Multi-Model • Uses an ensemble to trained models for Self-Training • Co-Training • Train two models with different independent features sets • Add most confident instances from U of one model into L of the other • Multi-View training • Find ensemble of multiple diverse models trained on L which also tend to all agree well on U CS 678 - Ensembles and Bayes

More Models Generative – Assume data can be represented by some mixture of parameterized models (e.g. Gaussian) and use EM to learn parameters (ala Baum-Welch) CS 678 - Ensembles and Bayes

Graph Models • Graph Models • Neighbor nodes assumed to be similar with larger edge weights. • Force same class member in L to be close, while maintaining smoothness with respect to the graph for U. • Add in members of U as neighbors based on some similarity • Iteratively label U (breadth first) CS 678 - Ensembles and Bayes

TSVM • Transductive SVM (TSVM) or Semi-Supervised SVM (S3VM) • Maximize margin of both L and U. Decision surface placed in non-dense spaces • Assumes classes are "well-separated" • Can also try simultaneously maintain class proportion on both sides similar to labeled proportion CS 678 - Ensembles and Bayes

CS 678 - Ensembles and Bayes

Semi-Supervised Learning