1 / 27

Building Sentiment Resources On Chinese Reviews

Building Sentiment Resources On Chinese Reviews. Zhang Haochen. Self Introduction. Zhang Haochen ( 张昊辰 ) Ph.D student THUIR, Tsinghua University, China Football, Cooking. Overview. Introduction Related work Issue description Approach Prototype design Pre-processing

ganit
Download Presentation

Building Sentiment Resources On Chinese Reviews

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Sentiment Resources On Chinese Reviews Zhang Haochen

  2. Self Introduction Zhang Haochen(张昊辰) Ph.D student THUIR, Tsinghua University, China Football, Cooking

  3. Overview • Introduction • Related work • Issue description • Approach • Prototype design • Pre-processing • Feature extraction • Opinion extraction • Polarity classification • Evaluation • Conclusion • Future work

  4. Overview • Introduction • Related work • Issue description • Approach • Prototype design • Pre-processing • Feature extraction • Opinion extraction • Polarity classification • Evaluation • Conclusion • Future work

  5. Introduction • Content : Factual vs. Subjective • UGC in Web 2.0 • Reviews on entities: product, movie, news … • Opinionated information: tweet, BBS, … • Application • E-commercial • Public opinion • Recommendation

  6. Overview • Introduction • Related work • Issue description • Approach • Prototype design • Pre-processing • Feature extraction • Opinion extraction • Polarity classification • Evaluation • Conclusion • Future work

  7. Related work • Typical tasks (Pang., 2008): • Extraction: feature / aspect, opinion • Classification: subjective, polarity • Summarization • Search and Comparison • Approaches: • syntax-based • supervised vs. unsupervised • bootstrap / propagation

  8. Overview • Introduction • Related work • Issue description • Approach • Prototype design • Pre-processing • Feature extraction • Opinion extraction • Polarity classification • Evaluation • Conclusion • Future work

  9. Issue description • I/O • Reviews of particular domain / products. • Sentiment dictionary for the domain / products. • Corpus • Chinese : Segmentation, POS tagging • Internet : Spam, OOV, Oral • Difficulties • Noises • Various patterns • Oral and OOV • Solution • Syntax-based + OOV • Pruning

  10. Overview • Introduction • Related work • Issue description • Approach • Prototype design • Pre-processing • Feature extraction • Opinion extraction • Polarity classification • Evaluation • Conclusion • Future work

  11. Prototype design

  12. Pre-processing • Filter noises of POS tagging results • If A is the subset of B, then take A • For completely unmatched tags, annotate with unknown(z) • Same segmentation, diff tag, annotated with unknown (z) • Remove redundant sentences • Remove sentences with too many punctuations.

  13. Feature extraction • Specific patterns • more than noun • verb, morpheme involved • with frequency greater than given threshold • more noises • Verbal stop words • verb as part of phrase. • verb as predicate

  14. Feature extraction • OOV • context entropy gain • whether B should compose phrase with A • mutual information • whether AB should be composed • iteratively

  15. Feature extraction • Co-occurrence frequency with adjective words • Sectional threshold • Filter common words with background corpus (from SogouT, 20M size)

  16. Opinion extraction • Syntax-based • adjacent adjective words • ignore adverb words • in specific windows. • contribute about 70% of the final results

  17. Opinion extraction • OOV • assumption:F + adv. + O + func. • adv. and func. set • between F and Adj. • between Adj and Punc. • phrases between adv. and func. • Pruning • frequency • co-occur with features

  18. Polarity classification • Feature-opinion vs. opinion • high - ? • high price - negative • Initial with polarity of words. • HowNet • Tsinghua • NTU Sentiment Dictionary

  19. Polarity classification • Classify iteratively • Classify unlabeled FO pairs with adjacent FO pairs in one sentence • Classify FO pairs in the entire corpus

  20. Overview • Introduction • Related work • Issue description • Approach • Prototype design • Pre-processing • Feature extraction • Opinion extraction • Polarity classification • Evaluation • Conclusion • Future work

  21. Evaluation • Reviews in domain of camera • 100, 000+ sentences • 769 feature phrases • 806 opinion phrases • 8640 feature-opinion pairs • 5745 positive • 315 neutral • 1948 negative • 632 unknown (treated as neutral in final results) • Performance • feature extraction • opinion extraction • polarity classification

  22. Overview • Introduction • Related work • Issue description • Approach • Prototype design • Pre-processing • Feature extraction • Opinion extraction • Polarity classification • Evaluation • Conclusion • Future work

  23. Conclusion Chinese corpus is different from English corpus and is more troublesome. Syntax-based method is proved to be easy but efficient to explicit features and opinions on well-expressed corpus. Syntax-based method may perform badly on oral corpus.

  24. Overview • Introduction • Related work • Issue description • Approach • Prototype design • Pre-processing • Feature extraction • Opinion extraction • Polarity classification • Evaluation • Conclusion • Future work

  25. Future work more accurate and proper model employ and refer to some approaches of other AI research words apply learning methods implicit features and opinions cross different domains

  26. Q & A

  27. Thank you

More Related