A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Problem • Supervised Learning

Problem • Semi-Supervised Learning

Problem • PU Learning

Problem • Unlabeled Examples Help

Problem • PU Learning • To distinguish • the interesting instances (the positive class C+) with • other instances (the negative class C-) • by learning a classifier from • a set of positive examples Pand • a set of unlabeled examples U There is no labeled negative example!

Applications • To automatically filter web pages according to a user's preference • the browsed or bookmarked pages can be used as positive examples • while unlabeled examples can be easily collected from the web • To automatically find machine learning literature • the ICML papers can be used as positive examples • while unlabeled examples can be easily collected from the ACM or IEEE digital library • To automatically identify cancer patients • the patients known to have cancers can be used as positive examples • while unlabeled examples can be easily collected from the patient database • To automatically discover future customers for direct marketing • the current customers of the company can be used as positive examples • while unlabeled examples can be purchased at a low cost compared with obtaining negative examples • ……

Approaches • Existing Approaches • PNB (Denis et al. 2002); PNCT (Denis et al. 2003) • S-EM (Liu et al. 2002); RC-SVM (Li & Liu 2003) • PEBL (Yu et al. 2004); SVMC (Yu 2005) • PN-SVM (Fung et al. 2005) • W-LR (Lee & Liu 2003); B-SVM (Liu et al. 2003) • Our Proposed Approach • B-Pr

A Probabilistic Model Our Approach

Our Approach

Our Approach • Biased PrTFIDF (B-Pr) • Estimate • PrTFIDF (Joachims 1997) • Estimmate • Maximize • On a held-out validation set • (Lee & Liu 2003) • Linear Time Complexity!

Experiments • Reuters-21578 B-Pr>RC-SVM>PEBL (p=0.55) RC-SVM>B-Pr>PEBL (p=0.85)

Experiments • 20NewsGroups B-Pr>W-LR>S-EM (p=0.3) B-Pr>W-LR>S-EM (p=0.7)

Conclusion • A New Approach to Learning from Positive and Unlabeled Examples • As effective as the state-of-the-art approaches • Yet simpler and faster

Thank you • Questions? • Comments? • Suggestions? • ……

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples

Presentation Transcript

Learning with Positive and Unlabeled Examples using Weighted Logistic Regression

Self-taught Learning Transfer Learning from Unlabeled Data

Simple examples of the Bayesian approach

A Positive Approach to Organizational Learning and Transformational Collaboration

A positive approach

Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

Learning from Positive and Unlabeled Examples

Positive Unlabeled Learning for Time Series Classification

Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science

A probabilistic approach to language structure

Learning from labelled and unlabeled data

A Probabilistic Approach to Semantic Representation

Learning from Only Positive Examples in Learning By Observation

Improving the Graph Mincut Approach to Learning from Labeled and Unlabeled Examples

A Probabilistic Approach to Vieta’s Formula

Learning from Labeled and Unlabeled Data using Graph Mincuts

Update on Learning By Observation Learning from Positive Examples Only

A Theoretical Model for Learning from Labeled and Unlabeled Data

Learning with Positive and Unlabeled Examples using Weighted Logistic Regression

Learning from Positive Cases

Learning Description From Examples