240 likes | 393 Views
Jiyin He University of Amsterdam. AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL. Ben He Craig Macdonald Iadh Ounis University of Glasgow. CIKM 2008. Introduction. Finding opinionated blog posts is still an open problem.
E N D
Jiyin He University of Amsterdam AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow CIKM 2008
Introduction • Finding opinionated blog posts is still an open problem. • A popular solution is to utilize the external resources and manual efforts in identifying subjective features. • The authors proposed a dictionary-based statistical approach, which automatically derives evidence for subjectivity from the blog collection itself, without requiring any manual effort.
TREC Opinion Finding Task (1/2) • Text REtrieval Conference. • Goal: To identify sentiment at the document-level. • The dataset are composed of: • Feed documents: XML format, usually a short summary of the blog post. • Permalink documents: HTML format, the complete blog post and its comments. • Homepage documents: HTML format, main entry to the blog.
TREC Opinion Finding Task (2/2) • Sample query format: <top> <num> 863 <title> netflix <desc> Identify documents that show customer opinions of Netflix. <narr> A relevant document will indicate subscriber satisfaction with Netflix. Opinions about the Netflix DVD allocation system, promptness or delay in mailings are relevant. Indications of having been or intent to become a Netflix subscriber that do not state an opinion are not relevant. </top>
Dictionary Generation • The Skewed Query Model • Rank all terms in the collection by term frequencies in descending order. • The terms, whose rankings are in the range (S·#terms, U·#terms) are selected in the dictionary. • #terms : the number of unique terms in the collection • S,U : model parameters. S=0.00007 and U=0.001 in this paper.
Dictionary Generation Ex: #terms=200,000 #terms x 0.00007=14 #terms x 0.001=200 Only those terms ranked 14 to 200 will be preserved • The dictionary is not necessary opinionated.
Term Weighting (1/2) • KL divergence method • D(Rel): Collection of relevant documents. • D(opRel): Collection of opinionated and relevant documents. • c(D(opRel))= #tokens in the opinionated documents. • c(D(Rel))= #tokens in the relevant documents. • tfx=term frequency of the term t in the opinionated documents. • tfrel=term frequency of the term t in the relevant documents.
Term Weighting (2/2) • Bose-Einstein statistics method • Measures how informative a term is in the set D(opRel) against D(Rel). • = • : the frequency of the term t in the D(Rel). • : the number of documents in D(Rel). • : the frequency of the term t in the D(opRel).
Generating the Opinion Score • Take the X top weighted terms from the opinion dictionary. • X will be tuned in the training step. • Submit them to the retrieval system as a query Qopn. • Score(d,Qopn): the opinion score of document d. • Score(d,Q): the initial ranking score.
Score Combination • Linear combination: • Log combination: • a, k will be tuned in the training step.
Experiment Settings (1/3) • TREC06: 50 topics for training. • TREC07: 50 topics for testing. • Only the “title” field is used (1.74 words/topic). • Baseline 1: Apply InLB model, a variation of the BM25 ranking function. Retrieve as many relevant documents as possible.
Experiment Settings (2/3) • Baseline 2: favor documents where the query terms appear in close proximity. • Q2: The set of all query term pairs in query Q. • N: #Docs in the collection. • T: #Tokens in the collection. • pfn: The normalized frequency of the tuple p.
Experiment Settings (3/3) • Manually collecting an external dictionary from OpinionFinder and several other resources. • Contains approximately 12,000 English words, mostly adjectives, adverbs and nouns.
Experiment: Term Weighting (1/2) • Hypothesis: the most opinionated terms for one query set are also good indicators of opinion for other queries. • Sampling: • For each sample set, calculate the weight of each terms. Overlap : 65% maximum Training Set (50 Topics) Each with 25 Topics … Set1 Set2 Set 10
Experiment: Term Weighting (2/2) • Compute the cosine similarity between the weights of the top 100 weighted terms from each two samples
Experiment: Validation (1/3) • Tuning the parameters X, a and k mentioned before. • Maximize X by maximizing the mean MAP of the 10 samples. Set1’ for validation Training Set (50 Topics) Set1 for assigning term weight
Experiment: Validation (3/3) • Fix X=100, tuning a and k. • a within [0, 1] , step=0.05 • k within (0, 1000] , step=50
Experiment: Evaluation (3/3) • Comparison with the OpinionFinder • All being equal, replace the opinion score Score(d,Qopn) with
Conclusion • An effective and practical approach to retrieving opinionated blog posts without manual effort. • Opinion scores are computed during indexing • Computational cost is negligible. • The automatically generated internal dictionary performs as good as the external dictionary. • Diferrent random samples from the collection reach a high consensus on the opinionated terms if the Bose-Einstein statistics given by the geometric distribution are applied.