150 likes | 231 Views
PEBL: Web Page Classification without Negative Examples. Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti. Introduction.
E N D
PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti
Introduction • Web page classification is one of the main techniques for Web mining • Constructing a classifier requires positive and negative training examples • Cautious to avoid bias and laborious to collect negative training examples
Positive Example Base Learning (PEBL) Framework • Learn from positive data and unlabeled data • Unlabeled data indicates random samples of the universal set • Apply the Mapping-Convergence (M-C)Algorithm
Mapping-Convergence (M-C) Algorithm • Divide into 2 stages • Mapping stage • Use any classifier that does not generate false negatives • They chose 1-DNF ( monotone Disjunctive Normal Form) • Convergence stage • For maximizing margin • They chose SVM (Support Vector Machine)
Mapping Stage • Use a weak classifier to draw an initial approximation of “strong” negative data. • First, Identify strong positive features from positive and unlabeled data by checking the frequency of those features. • If feature frequency in positive data is larger than one in the universal data, it is a strong positive • Filter out any possible positive, leaving only strong negatives.
Convergence Stage • Use SVM to scope down the class boundary • Iterate SVM for certain times to extract negative data from unlabeled data • The boundary will converge into the true boundary.
Support Vector Machines Visualization of a Support Vector Machine
Experimental Results • Report the result with precision-recall breakeven point (P-R) • Experiment 1: the Internet • Use DMOZ as the universal set • Experiment 2: University CS department • UseWebKB data set • Mixture Models
Summary and Conclusions • PEBL framework eliminates the need for manually collecting negative training examples • The Mapping-Convergence (M-C) algorithm achieves classification accuracy as high as that of traditional SVM • PEBL needs faster training time