1 / 21

Opinion Retrieval from Blogs

Opinion Retrieval from Blogs. Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 wzhang@cs.uic.edu yu@cs.uic.edu meng@cs.binghamton.edu 1 Department of Computer Science, University of Illinois at Chicago 2 Department of Computer Science, Binghamton University. CIKM 2007. 1.

trula
Download Presentation

Opinion Retrieval from Blogs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Opinion Retrieval from Blogs Wei Zhang1 Clement Yu1 Weiyi Meng2 wzhang@cs.uic.edu yu@cs.uic.edu meng@cs.binghamton.edu 1 Department of Computer Science, University of Illinois at Chicago 2 Department of Computer Science, Binghamton University CIKM 2007 1

  2. Outline • Overview of the opinion retrieval • Topic retrieval • Opinion identification • Ranking documents by opinion similarity • Experimental results CIKM 2007 2

  3. Overview of the Opinion Retrieval • Opinion retrieval • Given a query, find documents that have subjective opinions about the query • A query “book” • Relevant: “This is a very good book.” • Irrelevant: “This book has 123 pages.”

  4. Overview of the Opinion Retrieval • Introduced at TREC 2006 Blog Track • 14 groups, 57 submitted runs in TREC 2006 • 20 groups, 104 runs in TREC 2007 (on going) • Key problems • Opinion features • Query-related opinions • Rank the retrieved documents

  5. Our Algorithm Document set Query Retrieved documents Opinionative documents Query-related opinionative documents

  6. Topic Retrieval • Retrieve query-relevant documents • No opinion involved • Features • Phrase recognition • Query expansion • Two document-query similarities

  7. Topic Retrieval – Phrase Recognition • Semantic relationship among the words • For phrase similarity calculation purpose • 4 types • Proper noun: “University of Lisbon” • Dictionary phrase: “computer science” • Simple phrase: “white car” • Complex phrase: “small white car”

  8. Topic Retrieval – Query Expansion • Find the synonyms • “wto”  “world trade organization” • Same importance • Add additional terms • “wto”  negotiate, agreements, Tariffs,

  9. Topic Retrieval - Similarity • Sim(Query, Doc) = <Sim_P, Sim_T> • Phrase similarity • Having or not having a phrase • Sim_P = sum ( idf(P_i) ) • Term similarity • Sum of the Okapi scores of all the query terms • Document ranking • D1 is ranked higher than D2, if • (Sim_P1>Sim_P2) OR (P1=P2 AND T1>T2)

  10. Opinion Identification Subjective training data Objective training data Feature Selection retrieved documents opinionative documents SVM classifier From topic retrieval To opinion ranking

  11. Opinion Identification – Training Data • Subjective training data • Review web sites • Documents having opinionative phrases • Objective training data • Dictionary entries • Documents not having opinionative phrases

  12. Opinion Identification – Feature Selection • The words expressing opinions • Pearson’s Chi-square test • Test of the independence between subjectivity label and words via contingency table • Count the number of sentences • Unigrams and bigrams

  13. Opinion Identification – Classifier • A support vector machine (SVM) classifier Subjective sentences Objective sentences Features Feature vector representation Training SVM classifier

  14. Opinion Identification – Classifier • Apply the SVM classifier Document SVM classifier Sentence 1 Label 1:objective Sentence 2 Label 2:subjective … … Sentence n Label n:objective

  15. Opinion Similarity - Query-Related Opinions • Find the query-related opinions query opinionative sentence text window document document

  16. Opinion Similarity – Similarity 1 • Assumption 1 • Higher topic relevance • Higher rank • OSim_ir = Sim(Query, Doc)

  17. Opinion Similarity – Similarity 2 • Assumption 2 • More query-related opinions • Higher rank • OSim_stcc: total number of sentences • OSim_stcs: total score of sentences

  18. Opinion Similarity – Similarity 3 • A linear combination of 1 and 2 • a * Osim_ir + (1-a) * OSim_stcc • b * Osim_ir + (1-b) * OSim_stcs

  19. Opinion Similarity – Experimental Results • TREC 2006 Blog Track data • 50 queries, 3.2 million Blog documens • UIC at TREC 2006 Blog Track • Title-only queries: scored the first • 28% - 32% higher than best TREC 2006 scores • Good things learned • More training data • Combined similarity function

  20. Conclusions • Designed and implemented an opinion retrieval system. IR + text classification for opinion retrieval • The best known retrieval effectiveness on TREC 2006 blog data • Extend to polarity classification: positive/negative/mixed • Plan to improve feature selection

  21. Questions? • wzhang@cs.uic.edu • http://www.cs.uic.edu/~wzhang/

More Related