170 likes | 274 Views
Comments from Pre-submission Presentation. Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%. A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination].
E N D
Comments from Pre-submission Presentation • Q: Check why kNN is so lower than SVM on Reuters and 20 Newsgroups corpus? -10%. • A: Refer to the following four references: [Joachims 98] [Debole 03 STM] [Dumais 98 Inductive] [Yang 99 Reexamination]
[Joachims98][Debole03][Dumais98]Results on the Reuters Corpus
[Yang 99 Re-examination]Significance Test • Micro-level analysis (s-test) • SVM > kNN >> {LLSF, NNet} >> NB • Macro-level analysis • {SVM, kNN, LLSF} >> {NB, NNet} • Error-rate based comparison • {SVM, kNN} > LLSF > NNet >> NB
Comments from Pre-submission Presentation • 2. Explain why BEP & F1 in Chap 7 • -Add reference
Breakeven point (1) • BEP, first proposed by Lewis[1992]. Later, he himself pointed out that BEP is not a good effectiveness measure, because • 1. there may be no parameter setting that yields the breakeven; in this case the final BEP value, obtained by interpolation, is artificial; • 2. to have P=R is not necessarily desirable, and it is not clear that a system that achieves high BEP can be tuned to score high on other effectiveness measure.
Breakeven point (2) • Yang[1999Re-examinatio] also noted that when for no value of the parameters P and R are close enough, interpolated breakeven may not be a reliable indicator of effectiveness.
Comments from Pre-submission Presentation • 3. Add more qualitative analysis would be better
Analysis and Proposal: Empirical observation Comparison of idf, rf and chi2 value of four features in two categories of Reuters Corpus
Comments from Pre-submission Presentation • 4. Chap 7 remove Joachims Results using quotation is fine
Comments from Pre-submission Presentation • 5. Tone down “best” claims • to our knowledge (experience, understanding) • Pay attention this usage when doing presentation
Introduction:Other Text Representation • Word senses (meanings) [Kehagias 2001] • same word assumes different meanings in a different contexts • Term clustering [Lewis 1992] • group words with high degree of pairwise semantic relatedness • Semantic and syntactic representation [Scott & matwin 1999] • Relationship between words, i.e. phrases, synonyms and hypernyms
Introduction:Other Text Representation • Latent Semantic Indexing [Deerwester 1990] • A feature reconstruction technique • Combination Approach [Peng 2003] • combine two types of indexing terms, i.e. words and 3-grams • In general, high level representation did not show good performance in most cases
Literature Review:Knowledge-based Representation • Theme Topic Mixture Model – Graphical Model [Keller 2004] • Using keywords from summarization [Li 2003]
Literature Review: 2. How to weight a term (feature) • [Salton 1988] elaborated three considerations: • 1. term occurrences closely represent the content of document • 2. other factors with the discriminating power pick up the relevant documents from other irrelevant documents • 3. consider the effect of length of documents
Literature Review: 2. How to weight a term (feature) • 1. Term Frequency Factor • Binary representation (1 for present and 0 for absent) • Term frequency (tf): number of times a term occurs in a document • Log(tf): log operation to scale the effect of unfavorably high term frequency • Inverse term frequency (ITF)
Literature Review: 2. How to weight a term (feature) • 2. Collection Frequency Factor • idf: the most-commonly used factor • Probabilistic idf: aka. term relevance weight • Feature selection metrics: chi^2, information gain, gain ratio, odds ratio, etc.
Literature Review: 2. How to weight a term (feature) • 3. Normalization Factor • Combine the above two factors by using multiplication operation • In order to eliminate the length effect, we use the cosine normalization to limit the term weighting range within (0,1)