1 / 15

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples. Dell Zhang (BBK) and Wee Sun Lee (NUS). Problem. Supervised Learning. Problem. Semi-Supervised Learning. Problem. PU Learning. Problem. Unlabeled Examples Help. Problem. PU Learning To distinguish

brick
Download Presentation

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

  2. Problem • Supervised Learning

  3. Problem • Semi-Supervised Learning

  4. Problem • PU Learning

  5. Problem • Unlabeled Examples Help

  6. Problem • PU Learning • To distinguish • the interesting instances (the positive class C+) with • other instances (the negative class C-) • by learning a classifier from • a set of positive examples Pand • a set of unlabeled examples U There is no labeled negative example!

  7. Applications • To automatically filter web pages according to a user's preference • the browsed or bookmarked pages can be used as positive examples • while unlabeled examples can be easily collected from the web • To automatically find machine learning literature • the ICML papers can be used as positive examples • while unlabeled examples can be easily collected from the ACM or IEEE digital library • To automatically identify cancer patients • the patients known to have cancers can be used as positive examples • while unlabeled examples can be easily collected from the patient database • To automatically discover future customers for direct marketing • the current customers of the company can be used as positive examples • while unlabeled examples can be purchased at a low cost compared with obtaining negative examples • ……

  8. Approaches • Existing Approaches • PNB (Denis et al. 2002); PNCT (Denis et al. 2003) • S-EM (Liu et al. 2002); RC-SVM (Li & Liu 2003) • PEBL (Yu et al. 2004); SVMC (Yu 2005) • PN-SVM (Fung et al. 2005) • W-LR (Lee & Liu 2003); B-SVM (Liu et al. 2003) • Our Proposed Approach • B-Pr

  9. A Probabilistic Model Our Approach

  10. Our Approach

  11. Our Approach • Biased PrTFIDF (B-Pr) • Estimate • PrTFIDF (Joachims 1997) • Estimmate • Maximize • On a held-out validation set • (Lee & Liu 2003) • Linear Time Complexity!

  12. Experiments • Reuters-21578 B-Pr>RC-SVM>PEBL (p=0.55) RC-SVM>B-Pr>PEBL (p=0.85)

  13. Experiments • 20NewsGroups B-Pr>W-LR>S-EM (p=0.3) B-Pr>W-LR>S-EM (p=0.7)

  14. Conclusion • A New Approach to Learning from Positive and Unlabeled Examples • As effective as the state-of-the-art approaches • Yet simpler and faster

  15. Thank you • Questions? • Comments? • Suggestions? • ……

More Related