Opinion Retrieval from Blogs

Opinion Retrieval from Blogs Wei Zhang1 Clement Yu1 Weiyi Meng2 wzhang@cs.uic.edu yu@cs.uic.edu meng@cs.binghamton.edu 1 Department of Computer Science, University of Illinois at Chicago 2 Department of Computer Science, Binghamton University CIKM 2007 1

Outline • Overview of the opinion retrieval • Topic retrieval • Opinion identification • Ranking documents by opinion similarity • Experimental results CIKM 2007 2

Overview of the Opinion Retrieval • Opinion retrieval • Given a query, find documents that have subjective opinions about the query • A query “book” • Relevant: “This is a very good book.” • Irrelevant: “This book has 123 pages.”

Overview of the Opinion Retrieval • Introduced at TREC 2006 Blog Track • 14 groups, 57 submitted runs in TREC 2006 • 20 groups, 104 runs in TREC 2007 (on going) • Key problems • Opinion features • Query-related opinions • Rank the retrieved documents

Our Algorithm Document set Query Retrieved documents Opinionative documents Query-related opinionative documents

Topic Retrieval • Retrieve query-relevant documents • No opinion involved • Features • Phrase recognition • Query expansion • Two document-query similarities

Topic Retrieval – Phrase Recognition • Semantic relationship among the words • For phrase similarity calculation purpose • 4 types • Proper noun: “University of Lisbon” • Dictionary phrase: “computer science” • Simple phrase: “white car” • Complex phrase: “small white car”

Topic Retrieval – Query Expansion • Find the synonyms • “wto”  “world trade organization” • Same importance • Add additional terms • “wto”  negotiate, agreements, Tariffs,

Topic Retrieval - Similarity • Sim(Query, Doc) = <Sim_P, Sim_T> • Phrase similarity • Having or not having a phrase • Sim_P = sum ( idf(P_i) ) • Term similarity • Sum of the Okapi scores of all the query terms • Document ranking • D1 is ranked higher than D2, if • (Sim_P1>Sim_P2) OR (P1=P2 AND T1>T2)

Opinion Identification Subjective training data Objective training data Feature Selection retrieved documents opinionative documents SVM classifier From topic retrieval To opinion ranking

Opinion Identification – Training Data • Subjective training data • Review web sites • Documents having opinionative phrases • Objective training data • Dictionary entries • Documents not having opinionative phrases

Opinion Identification – Feature Selection • The words expressing opinions • Pearson’s Chi-square test • Test of the independence between subjectivity label and words via contingency table • Count the number of sentences • Unigrams and bigrams

Opinion Identification – Classifier • A support vector machine (SVM) classifier Subjective sentences Objective sentences Features Feature vector representation Training SVM classifier

Opinion Identification – Classifier • Apply the SVM classifier Document SVM classifier Sentence 1 Label 1:objective Sentence 2 Label 2:subjective … … Sentence n Label n:objective

Opinion Similarity - Query-Related Opinions • Find the query-related opinions query opinionative sentence text window document document

Opinion Similarity – Similarity 1 • Assumption 1 • Higher topic relevance • Higher rank • OSim_ir = Sim(Query, Doc)

Opinion Similarity – Similarity 2 • Assumption 2 • More query-related opinions • Higher rank • OSim_stcc: total number of sentences • OSim_stcs: total score of sentences

Opinion Similarity – Similarity 3 • A linear combination of 1 and 2 • a * Osim_ir + (1-a) * OSim_stcc • b * Osim_ir + (1-b) * OSim_stcs

Opinion Similarity – Experimental Results • TREC 2006 Blog Track data • 50 queries, 3.2 million Blog documens • UIC at TREC 2006 Blog Track • Title-only queries: scored the first • 28% - 32% higher than best TREC 2006 scores • Good things learned • More training data • Combined similarity function

Conclusions • Designed and implemented an opinion retrieval system. IR + text classification for opinion retrieval • The best known retrieval effectiveness on TREC 2006 blog data • Extend to polarity classification: positive/negative/mixed • Plan to improve feature selection

Questions? • wzhang@cs.uic.edu • http://www.cs.uic.edu/~wzhang/

Opinion Retrieval from Blogs

Opinion Retrieval from Blogs

Presentation Transcript

Learn SEO for free from Blogs

Blogs

Opinion Retrieval: Looking for opinions in the wild

Blogs

Differentiating Fact from Opinion and….

Opinion Mining From Twitter Feeds

Blogs, Blogs, Blogs

An Effective Statistical Approach to Blog Post Opinion Retrieval

AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL

Differentiating Fact from Opinion

A Unified Relevance Model for Opinion Retrieval

Opinion Retrieval

Blogs

Blogs

Blogs - From SharePoint to WordPress

The attractive online blogs from citizenkline

Blogs

From Blogs to Journalism

Summarization Focusing on Polarity or Opinion Fragments in Blogs

Automatic Hierarchy Discovery and Opinion Mining of Political Blogs

Extract Blogs Daily from Tumblr