Semi-Supervised and Active Learning

Active LearningAn exampleFrom Xu et al., “Training SpamAssassin with Active Semi-Supervised Learning”

Semi-Supervised and Active Learning • Semi-Supervised learning: Using a combination of labeled and unlabeled examples, or using partially labeled examples • Active learning: Having the learning system decide which examples to ask an oracle to label

Spamassassin • Spamassassin: • Asks users to label e-mail, but they don’t often do it. • Also, they may not label the “most informative” examples. • Spamassassin “self-training”: • Train classifier on small number of labeled examples. • Run these on unlabeled examples. Add the ones classified with high confidence to the original training set. (Problem – the ones classified with high confidence are not necessarily the most informative ones. • Retrain the classifier with the new, larger training set.

Xu et al. paper: Method • Supervised learning: Train Naive Bayes classifier on small subset of (labeled) e-mails. • Semi-supervised learning: Then run Spamassassin’s self-learning method, selecting a large number of new examples to add to training set. Retrain the classifier. • Active learning: Cluster remaining unlabeled e-mails using k-means (on term-frequency feature vectors) with Euclidean distance. Select q representative unlabeled e-mails, first from “pure” clusters, then from “impure clusters”, making sure that many clusters are sampled from. The e-mails selected from each cluster are the ones closest to the cluster centroids. Ask the user to label these q examples. For each of these q examples, if the corresponding cluster is “pure”, propagate this label to a fraction p of the that cluster. Add the newly labeled examples to the training set, and retrain the classifier.

Xu et al. paper: Results • Ran on a large corpus (75K) of e-mails.

Semi-Supervised and Active Learning

Semi-Supervised and Active Learning

Presentation Transcript

Semi-supervised Learning

Semi-Supervised Learning over Text

Semi-Supervised Learning

Semi-supervised learning and self-training

Semi-Supervised Learning

Semi-supervised learning

Semi-Supervised Learning

Semi-supervised learning

Semi-Supervised Learning

Semi-supervised Learning

Random Walks and Semi-Supervised Learning

Active Semi-Supervised Learning using Submodular Functions

Inductive Semi-supervised Learning

Supervised and semi-supervised learning for NLP

Semi-Supervised Learning

Semi-Supervised Learning

Semi-supervised Learning

Semi-Supervised Learning

COMP3503 Semi-Supervised Learning

Semi-Supervised Learning