1 / 15

Using Support Vector Machine for Integrating Catalogs

Using Support Vector Machine for Integrating Catalogs. Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE. Integrating Catalogs. Key Insight

rimona
Download Presentation

Using Support Vector Machine for Integrating Catalogs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Support Vector Machine for Integrating Catalogs Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE

  2. Integrating Catalogs • Key Insight • Many of the data source have their own categorization, and classification accuracy can be improved by factoring in the implicit information in these source categorizations. • A straightforward approach • Formulate catalog integration problem as a classification problem

  3. Related Researches • Why Naïve Bayes? • Naïve Bayes classifiers are competitive with other techniques in accuracy • Fast: single pass and quickly classify new documents • ATHENA: EDBT 2000 • On Integrating Catalogs (WWW10, 2001/5) • Classification • Mail Agent • SONIA (ACM Digital Library ‘98)

  4. d1 S1 C1 …… S2 C2 dk-2 dk-1 Cn Sm dk Problem Statement • A master catalog M with categories C1…Cn and a set of documents in each category • A source catalog N with a set of categories S1…Sn and other set of documents • We need to find the category in M for each document in N

  5. Documents Text Representation Implicit Information From Source Catalogs Document Classification Rule Prediction Model Feature Extraction A Overview of Integrating Catalogs

  6. Naïve Bayes ClassificationBasic Algorithm • A document may be assigned to more than one category • P(Ci|d) and P(Cj|d) both have high value • A document d, all the value of P(Ci|d) is low, kept aside for manual classification • If some S in N, a large fraction of document satisfy the previous condition, S may be a new category for M

  7. Google vs. Yahoo!Classification Accuracy

  8. Support Vector Machine Minimize Subject to

  9. Multi-Class Classification • SVM is binary classification technique • On going Research • One-against-one is better than other approach by experienment (cjlin, 2001)

  10. Using SVM for Text Classification • TF‧IDF (Term Frequency * Inverse Document Frequency) • Where k is normalization constant ensuring that . The function is clearly a valid kernel, since it is the inner product in an explicitly constructed feature space.

  11. Using SVM for Integrating Catalogs • Increasing β, result in more effect from S for classification to M. • More separate example let SVM finding the hyperplane with maximize margin easily. • New Kernel Function

  12. New Kernel Function for Integrating Catalogs Orthogonal We could treat these two catalogs orthogonal. Under this situation, the kernel function will be the same as standard classifier without information of source catalog when β=0.

  13. Accuracy (%) Dataset Naïve Bayes SVM Our Improve Finance&Business 53.45 56.12 64.11 14.24 % Computers 57.68 57.80 65.50 13.32 % Science 51.12 57.78 62.00 7.30 % Literature 37.39 40.13 53.78 34.01 % Psychology 47.66 54.44 59.96 10.14 % Average 49.46 53.25 61.07 15.80 % EXPERIENMENTAL RESULTSTrain: Books.com.tw, Test: Commonwealth

  14. Accuracy Dataset Naïve Bayes SVM Our Improve Finance&Business 42.77 49.74 53.12 6.80 % Computers 45.60 48.25 55.54 15.11 % Science 41.14 44.60 47.72 6.99 % Literature 30.99 37.21 42.21 12.44 % Psychology 40.05 40.11 43.35 8.08 % Average 40.11 43.98 43.39 10.08 % EXPERIENMENTAL RESULTSTrain: Commonwealth, Test: Books.com.tw

  15. Conclusions • SVM is very useful to the problem of integration catalogs with text documents. • Traditionally, SVM is a classification tool. In this paper, we using SVM with a novel kernel function to suit this problem. • The experienment here serves as a promising start for the use SVM for this problem. • Future Work: We can also improve the performance by incorporation of another kernel function and proved it, or combining structural information of text document

More Related