1 / 13

Combining Unsupervised Feature Selection Strategy for Automatic Text Categorization

Learn about a novel unsupervised feature selection strategy for automatic text categorization, exploring machine learning techniques and innovative algorithms developed to enhance categorization accuracy. This research addresses the challenges of handling vast amounts of text data efficiently and effectively.

janl
Download Presentation

Combining Unsupervised Feature Selection Strategy for Automatic Text Categorization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Unsupervised Feature Selection Strategy for Automatic Text Categorization Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Ping-Tsun Chang

  2. Introduction • In recent researches • The limit of using statistic or computational approach for natural language understanding • The develop of machine learning technique is almost reached its bound • Natural language is infinite and nonlinear! • Unsupervised Feature Selection Ping-Tsun Chang

  3. Sensing Classification Segmentation Post-Processing Feature Extraction Decision Text CategorizationBackground Knowledge • Problem Definition: Text Categorization is a problem to assign a unknown lebel to a large amount of document by a large amount of text data. Ping-Tsun Chang

  4. Background KnowledgeMachine Learning • Instance-Based Learning • K-Nearest Neighbors • Neural Networks • Support Vector Machine • Using Computer help us to induction from complex and large amount of pattern data • Bayesian Learning Ping-Tsun Chang

  5. Background KnowledgeFeature Selection • Information Gain • Mutual Information • CHI-Square Ping-Tsun Chang

  6. Baysian Classifier • Recent Researches • Naïve Bayes classifiers are competitive with other techniques in accuracy • Fast: single pass and quickly classify new documents • ATHENA: EDBT 2000 Ping-Tsun Chang

  7. d ? Machine LearningApproaches: kNN Classifier Ping-Tsun Chang

  8. Machine LearningApproaches: Support Vector Machine • Basic hypotheses : Consistent hypotheses of the Version Space • Project the original training data in space X to a higher dimension feature space F via a Mercel operator K Ping-Tsun Chang

  9. What is Certainly? • Rule for kNN • Rule for SVM Ping-Tsun Chang

  10. ALGORITHM Two-Stage-Text-Categorization (input: document d) returns category C Statistic: Trained classifier: Traditional-Classifier The feature set: F The new feature set by user feedback: Ui for related catehory Ci For new document d C ← Traditional-Classifier (d) If NOT satisfy the rule of uncertainly Return C Algorithm for Two-StageAutomatic Text Categorization Else For all category Ci If d have the feature in F C ← Ci Return C End If Cj ←User-Input Uj ← Uj + User-Selected C ←Cj END If Return C Ping-Tsun Chang

  11. Determine threshold of the Rule Ping-Tsun Chang

  12. Experienments Ping-Tsun Chang

  13. References [1] Dunja Mladenic, J. Stefen Institute, Text-Learning and Related Intelligent Agents: A Survey, IEEE Transactions on Intelligent Systems, pp. 44-54, 1999. [2] Yiming Yang, Improving Text Categorization Methods for Event Tracking, In Proceedings of the 23th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’00), 2000. [3] Yiming Yang, Combining Multiple Learning Strategies for Effective Cross Vaildation, In Proceedings of the 17th International Conference on Machine Learning (ICML ’00) ,2000. [4] V. Vapnik, The Nature of Statiscal Learning Theory. Springer, New York, 1995. [5] Thorsten Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevent Features.In European Conference on Machine Learning(ECML ’98), pages 137-142, Berlin, 1998, Springer. [6] Yiming Yang, A re-examination of Text Categorization Methods, In Proceedings of the 22th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’99), 1999. [7] Lee-Feng Chien. Pat-tree-based keyword extraction for Chinese information retrieval. In Proceedings of the 20th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’97), pages 50-58, 1997. [8] Jyh-Jong Tsay and Jing-Doo Wang, Improving Automatic Chinese Text Categorization by Error Correction. In Proceedings of Information Retrieval of Asian Languages(IRAL ’00), 2000. [9] James Tin-Yau Kwok, Automated Text Classification Using Support Vector Machine, International Conference on Neural Information Processing(ICNIP ’98), 1998. [10] Daphne Koller and Simon Tone, Support Vector Machine Active Learning with Applications to Text Classification, In Proceedings of International Conference on Machine Learning(ICML ’00), 2000. [11] Central News Agency, URL: http://www.cna.com.tw [12] Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods, Cambridge University Press, 2000. [13] D. E. Appelt, D. J. Israel. Introduction to Information Extraction Technology. Tutorial for International Joint Conference on Artificial Intelligence, Stockholm, August 1999. Ping-Tsun Chang

More Related