150 likes | 343 Views
Fuzzy Final Homework System Implementation Selected paper: Fuzzy integration of structure adaptive SOMs for web content mining , Fuzzy Sets and Systems 148 (2004) 43–60. Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.edu.tw. Outline. Introduction
E N D
Fuzzy Final HomeworkSystem ImplementationSelected paper: Fuzzy integration of structure adaptive SOMs for web content mining, Fuzzy Sets and Systems 148 (2004) 43–60 Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.edu.tw
Outline • Introduction • Proposed method in selected paper • Implementation • Conclusion • References
Introduction • In this report, we implement Kim and Cho’s paper appear on Fuzzy Set and System in 2004 • User profile represents different aspects of user’s characteristics • The author proposed an ensemble of classifiers that estimate user’s preference using web content labeled by user as “like” or “dislike”
Feature Selection Method Properties • Feature selection methods such as Information Gain, TFIDF, and ODDS ratio have different properties • TFIDF does not consider class values of documents when calculating the relevance of features while information gain uses class labels of documents • Odds ratio uses class labels of documents but they find useful features to classify only one specific class
Overview of the proposed method in [1] Classification TFIDF, Information Gain, ODDS Ratio
Training SASOM’s using different feature sets Hot Fuzzy Integral or Cold
Data Set Description • UCI Syskill & Webert data (http://kdd.ics.uci.edu) • Contain the HTML source of web pages plus the ratings of a single user on these web pages • The web pages are on four separate subjects • Bands- recording artists (Implement in this report) • Goats (Implement in this report) • Sheep • BioMedical
Implementation • Coding Java (J2SE 1.5) program for preprocessing, feature selection (TFIDF and ODDS Ratio), and Fuzzy Integral mechanism • Using Weka for Feature Selection (Information Gain) and Classification • This report not successfully program SASOM…
Implementation-preprocessing UCI Syskill & Webert data After Stopword and Porter Stemmer ExtractHTMLContent.java Pure Text without Anchor Text Bands_Stopword.txt Bands_Porter.txt Bands.txt
In Bands, 61 dataset E.g. Attribute Number: 5436->32 Implementation- Feature Selection
Implementation- Fuzzy Integral Fuzzy measure of classifiers that are determined subjectively [1] Bayes Classifier b1,b2,b3 0.99 b1=0, b2=1, b3=0 FuzzyIntegral.java
Conclusion • Fuzzy integral provides the method of measuring the importance of classifiers subjectively, especially in semi-supervised learning method • The method based on fuzzy integral can be effectively applied to web content mining for predicting user’s preference as user profile • Fuzzy Integral maybe can apply into my research area to integrate expert or user’s knowledge
References • Kyung-Joong Kim, Sung-Bae Cho, Fuzzy integration of structure adaptive SOMs for web content mining, Fuzzy Sets and Systems 148 (2004) 43–60 • Pazzani M., Billsus, D., Learning and Revising User Profiles: The identification of interesting web sites, Machine Learning 27 (1997), 313-331 • http://kdd.ics.uci.edu/databases/SyskillWebert/SyskillWebert.data.html