1 / 19

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval. Min Zhang, Xinyao Ye Tsinghua University SIGIR 2008 2009. 01. 14. Summarized & presented by Jung-Yeon Yang IDS Lab. Introduction.

Download Presentation

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR 2008 2009. 01. 14. Summarized & presented by Jung-Yeon Yang IDS Lab.

  2. Introduction • A growing interest in finding out people’s opinions from web data • Product survey • Advertisement analysis • Political opinion polls • TREC started a special track on blog data in 2006 – blog opinion retrieval • It has been the track that has the most participants in 2007 • This paper is focused on the problem of searching opinions over general topics

  3. Related Work • The popular opinion identification approaches • Text classification • Lexicon-based sentiment analysis • Opinion retrieval • Opinion retrieval • To find the sentimental relevant documents according to a user’s query • One of the key problems • How to combine opinion score with relevance score of each document for ranking

  4. Related Work (cont.) • Topic-relevance search is carried out by using relevance ranking(e.g. TF*IDF ranking) • Ad hoc solutions of combining relevance ranking and opinion detection result • 2 steps: rank with relevance, then re-rank with sentiment score • Most existing approaches use a linear combinationα*Scorerel+ β*Scoreopn

  5. Backgrounds • Statistical Language Model (LM) • A probability distribution over word sequences • p(“Today is Wednesday”)  0.001 • p(“Today Wednesday is”)  0.0000000000001 • Unigram : P(w1,w2,w3) = P(w1)*P(w2)*P(w3) • Bigram : P(w1,w2,w3) = P(w1)*P(w2|w1)*P(w3|w2) • Can also be regarded as a probabilistic mechanism for “generating” text, thus also called a “generative” model • LM allows us to answer questions like: • Given that we see “John” and “feels”, how likely will we see “happy” as opposed to “habit” as the next word? (speech recognition) • Given that we observe “baseball” three times and “game” once in a news article, how likely is it about “sports”? (text categorization, information retrieval)

  6. Relevance P(d q) or P(q d) Probabilistic inference (Rep(q), Rep(d)) Similarity P(r=1|q,d) r {0,1} Probability of Relevance Regression Model Generative Model Different inference system Different rep & similarity Query generation Doc generation … Inference network model Prob. concept space model Vector space model Prob. distr. model Classical prob. Model LM approach Backgrounds (cont.) The notion of relevance

  7. Document language model Discounted LM estimate Collection language model Backgrounds (cont.) • Retrieval as Language Model Estimation • Document ranking based on query likelihood • Retrieval problem  Estimation of p(wi|d) • Smoothing is an important issue • Problem : If tf = 0 , then p(wi|d) = 0 • Smoothing methods try to • Discount the probability of words seen in a document • Re-allocate the extra probability so that unseen words will have a non-zero probability • Most use a reference model(collection language model) to discriminate unseen words

  8. Generation Model for Opinion Retrieval • The Generation Model • To find both sentimental and relevant documents with ranks • Topic Relevance Ranking • Opinion Generation Model and Ranking • Ranking function of generation model for opinion retrieval

  9. The Proposed Generation Model • Document generation model • how well the document d “fits” the particular query qp(d |q)∝p(q |d)p(d) • In opinion retrieval, p(d |q, s ) • Users’ information need is restricted to only an opinionate subset of the relevant documents • This subset is characterized by sentiment expressions s towards topic q

  10. The Proposed Generation Model (cont.)  quadratic relationship • In this work, discuss lexicon-based sentiment analysis • Assume • The latent variable s is estimated with a pre-constructed bag-of-word sentiment thesaurus • All sentiment words si are uniformly distributed • The final generation model

  11. Topic Relevance Ranking Irel (d,q) IDF(w) • The Binary Independent Retrieval (BIR) model is one of the most famous ones in this branch • Heuristic ranking function BM25 • TREC tests have shown this to be the best of the known probabilistic weighting schemes

  12. Opinion Generation Model Iop (d,q,s) • Iop(d,q,s) focus on the problem that given query q, how probably a document d generates a sentiment expression s. • Sparseness problem  smoothing • Jelinek-mercersmoothing is applied.

  13. Opinion Generation Model Iop (d,q,s) Use the co-occurrence of si and q inside d within a window W as the ranking measure of Pml (si |d,q)

  14. Ranking function of generation model for opinion retrieval  Only use the topic relevance The final ranking function To reduce the impact of unbalance between #(sentiment words) and #(query terms) logarithm normalization

  15. Experimental Setup • Data set • TREC blog 06 & 07 • 100,649 blogs during 2.5 month • Strategy : find top 1000 relevant documents, then re-rank the list with proposed model • Models • General linear combination • Proposed generation model with smoothing • Proposed generation model with smoothing and normalization

  16. Experimental Setup (cont.) Sentimental Lexicons

  17. Expetiment Results

  18. Conclusion • Proposed a formal generation opinion retrieval model • Topic relevance & sentimental scores are integrated with quadratic comb. • Opinion generation ranking functions are derived • Discussed the roles of the sentiment lexicon and the matching window • It is a general model for opinion retrieval • Use the domain-independent lexicons • No assumption has been made on the nature of blog-structure text

  19. My opinions • Good points • proposed the opinion retrieval model and ranking function • Do various experiments • Lacking points • Just find documents that has some opinions • Don’t know what kinds of opinions in the documents • Don’t use the sentimental polarity of words

More Related