80 likes | 180 Views
Multi-class SVM with Negative Data Selection for Web Page Classification. Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference on Neural Networks 2004. Motivation. Several new websites are launched everyday Need to search fast and efficiently
E N D
Multi-class SVM with Negative Data Selection for Web Page Classification Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference on Neural Networks 2004
Motivation • Several new websites are launched everyday • Need to search fast and efficiently • Search engines organize websites under topic hierarchy (taxonomy) • Need a classifier: one-against-all SVM • Catch: huge negative data increased training time
Negative Data Selection Support vectors in the negative data are much similar to the positive data than the other negative data
Negative Data Selection • Feature Selection: top n keywords from the positive data • All websites are represented as vectors of these top n keywords. • Cosine Similarity:
Negative Data Selection • Plot similarity scores of negative to positive documents in descending order with negative documents Convergence Point Similarity Scores in Descending order Negative Documents
Experiments • Reuters dataset (10802 training, 565 test)