1 / 18

UIC at TREC 2006: Blog Track

Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago. UIC at TREC 2006: Blog Track. Summary. Overview of the opinion retrieval Relevant document retrieval Opinion relevant document retrieval Opinion system Subjective/objective training data

tory
Download Presentation

UIC at TREC 2006: Blog Track

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago UIC at TREC 2006: Blog Track

  2. Summary • Overview of the opinion retrieval • Relevant document retrieval • Opinion relevant document retrieval • Opinion system • Subjective/objective training data • Feature extraction • Subjectivity classifier • Opinion document ranking

  3. Document Space Opinion Documents Query Opinion Relevant Documents Relevant Documents Opinion Document Retrieval

  4. Opinion Document Retrieval • Relevant documents • an IR approach • Opinion relevant documents • a classification approach

  5. Relevant Document Retrieval • The UIC IR system in TREC 2005 Robust Track • Without WSD and adding synonyms/hyponyms • Phrase recognition • Proper name, dictionary phrase • Simple phrase, complex phrase • Query expansion • pseudo relevant feedback, Wikipedia, Web • Document-query similarity • Phrase similarity and term similarity

  6. Retrieved documents … another bad thing about march of the penguins - I totally agree. For a documentary , it carried just about no information. … a document ... " march of the penguins ," which was excellent yet really pretty disturbing … Opinion sentences opinion relevant document Opinion Relevant Document Retrieval

  7. The Opinions • Opinions are query dependent • food automobile • Should be learned and tested depending on queries • Should be analyzed within the sentences

  8. Opinion System Overview query Wikipedia.org Objective sentences Rateitall.com Subjective sentences Feature Extraction Retrieved Documents Opinion Documents SVM classifier Final answers Opinion Relevant Documents Opinion - query connection Re-rank

  9. The Objective Sentences • Wikipedia.org pages as primary source • every sentence is objective • multiple pages for multiple phrases • Web pages as secondary source • from web search engine • restriction: -comment -review, -”I think”

  10. The Subjective Sentences • Rateitall.com pages as primary source • every comment sentence is subjective • Web pages as secondary source • from web search engine • restriction: +comment, +review, +”I think”.

  11. The Featured Terms • Use unigrams and bigrams • Chi-square test • to test the hypothesis that a term t is distributed unevenly in the objective text set and the subjective text set

  12. The Sentence Classifier • Support Vector Machine sentence classifier Subjective sentences Objective sentences Featured terms Featured term vector representation SVM Training SVM classifier

  13. Find the Opinion Documents • A retrieved document that contains at least one opinion sentence • Split document to sentences • Test each sentence by the classifier Document SVM classifier Sentence 1 Label 1:objective Sentence 2 Label 2:subjective … … Sentence n Label n:objective

  14. Find the Opinion Relevant Documents • A retrieved document that contains at least one opinion “relevant” sentence • query terms in or near a opinion sentence query opinion sentence text window document document

  15. Rank the Opinion Relevant Documents • Strategy 1 • Use the document retrieval ranking • Remove documents that does not have opinion relevant sentence Sim(D, Q): query-doc similarity I(D, Q) = 1 if D contains opinion relevant sentence = 0 otherwise

  16. Rank the Opinion Relevant Documents • Strategy 2 • Calculate a document opinion score OS(D): opinion sentence set of document D Scoreclassification(s): score of the opinion sentence s from the SVM classifier Relevant(s, Q): 1 if s is a opinion relevant sentence, 0 otherwise

  17. Blog Track Results

  18. Thanks! and Questions?

More Related