160 likes | 367 Views
A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples. Dell Zhang (BBK) and Wee Sun Lee (NUS). Problem. Supervised Learning. Problem. Semi-Supervised Learning. Problem. PU Learning. Problem. Unlabeled Examples Help. Problem. PU Learning To distinguish
E N D
A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)
Problem • Supervised Learning
Problem • Semi-Supervised Learning
Problem • PU Learning
Problem • Unlabeled Examples Help
Problem • PU Learning • To distinguish • the interesting instances (the positive class C+) with • other instances (the negative class C-) • by learning a classifier from • a set of positive examples Pand • a set of unlabeled examples U There is no labeled negative example!
Applications • To automatically filter web pages according to a user's preference • the browsed or bookmarked pages can be used as positive examples • while unlabeled examples can be easily collected from the web • To automatically find machine learning literature • the ICML papers can be used as positive examples • while unlabeled examples can be easily collected from the ACM or IEEE digital library • To automatically identify cancer patients • the patients known to have cancers can be used as positive examples • while unlabeled examples can be easily collected from the patient database • To automatically discover future customers for direct marketing • the current customers of the company can be used as positive examples • while unlabeled examples can be purchased at a low cost compared with obtaining negative examples • ……
Approaches • Existing Approaches • PNB (Denis et al. 2002); PNCT (Denis et al. 2003) • S-EM (Liu et al. 2002); RC-SVM (Li & Liu 2003) • PEBL (Yu et al. 2004); SVMC (Yu 2005) • PN-SVM (Fung et al. 2005) • W-LR (Lee & Liu 2003); B-SVM (Liu et al. 2003) • Our Proposed Approach • B-Pr
A Probabilistic Model Our Approach
Our Approach • Biased PrTFIDF (B-Pr) • Estimate • PrTFIDF (Joachims 1997) • Estimmate • Maximize • On a held-out validation set • (Lee & Liu 2003) • Linear Time Complexity!
Experiments • Reuters-21578 B-Pr>RC-SVM>PEBL (p=0.55) RC-SVM>B-Pr>PEBL (p=0.85)
Experiments • 20NewsGroups B-Pr>W-LR>S-EM (p=0.3) B-Pr>W-LR>S-EM (p=0.7)
Conclusion • A New Approach to Learning from Positive and Unlabeled Examples • As effective as the state-of-the-art approaches • Yet simpler and faster
Thank you • Questions? • Comments? • Suggestions? • ……