1 / 39

Automatic Domain Adaptive Sentiment Analysis Phase 1

Justin Martineau. Automatic Domain Adaptive Sentiment Analysis Phase 1. Outline. Introduction Problem Definition Thesis Statement Motivation Background and Related Work Challenges Approaches Research Plan Approach Evaluation Timeline Conclusion.

liv
Download Presentation

Automatic Domain Adaptive Sentiment Analysis Phase 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Justin Martineau Automatic Domain Adaptive Sentiment Analysis Phase 1

  2. Outline • Introduction • Problem Definition • Thesis Statement • Motivation • Background and Related Work • Challenges • Approaches • Research Plan • Approach • Evaluation • Timeline • Conclusion

  3. 1. Intro- 2. Related Work - 3. Research Plan - 4. Conclusion Problem Definition • Sentiment Analysis is the automatic detection and measurement of sentiment in text segments by machines. • 3 Sub Tasks • Objective vs. Subjective • Topic Detection • Positive vs. Negative • Commonly applied to web data • Very Domain Dependent

  4. 1. Intro- 2. Related Work - 3. Research Plan - 4. Conclusion Sentiment Analysis Example

  5. 1. Intro- 2. Related Work - 3. Research Plan - 4. Conclusion Thesis Statement This dissertation will develop and evaluate techniques to discover and encode domain-specific, domain-independent, and semantic knowledge to improve both single and multiple domain sentiment analysis problems on textual data given low labeled data conditions.

  6. 1. Intro- 2. Related Work - 3. Research Plan - 4. Conclusion Motivation: Private Sector • Market Research • Surveys • Focus Groups • Feature Analysis • Customer targeting (Free samples etc…) • Consumer Sentiment Search • Compare pros and cons • Overall opinion of products/services

  7. 1. Intro- 2. Related Work - 3. Research Plan - 4. Conclusion Motivation: Public Sector • Political • Alternative Polling • Determine popular support for legislation • Choose campaign issues • National Security • Detect individuals at risk for radicalization • Determine local sentiment about US policy • Determine local values and sentimental icons • Portray actions positively using local flavor • Public Health • Detect potential suicide victims • Detect mentally unstable people

  8. 1. Intro -2. Related Work- 3. Research Plan - 4. Conclusion Challenges • Text Representation • Unedited Text • Sentiment Drift • Negation • Sarcasm • Sentiment Target Identification • Granularity • Domain Dependence

  9. 1. Intro -2. Related Work- 3. Research Plan - 4. Conclusion Domain Dependence 1Domain Dependent Sentiment • The same sentence can mean two very different things in different domains • Ex: “Read the book.” <= Good for books, bad for movies • Ex: “Jolting, heart pounding, You’re in for one hell of a bumpy ride!” Good for movies and books, bad for cars. • Sentimental word associations change with domain • Fuzzy cameras are bad, but fuzzy teddy bears are good. • Big trucks are good, but big iPods are bad. • Bad is bad, but bad villains are good.

  10. 1. Intro -2. Related Work- 3. Research Plan - 4. Conclusion Domain Dependence 2 Endless Possibilities

  11. 1. Intro -2. Related Work- 3. Research Plan - 4. Conclusion Domain Dependence 3Organization and Granularity

  12. 1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion Theory of the Three Signals • Authors communicate messages using three types of signals • Domain-Specific Signals • Domain-Independent Signals • Semantic Signals • More specific signals are generally more powerful than more generic signals

  13. 1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion Domain-Specific Signals • Fuzzy teddy bears • Sharp pictures • Sharp knives • Smooth rides • New ideas • Fast servers • Fast cars • Slow roasted burgers • Slow motion • Small cameras • Big cars • Dependent on problem and domain • Considered more useful by readers • Tells what is good or bad about topic • Domain knowledge determines sentiment orientation • Very strong in context, but weak or misleading out of context • Can cause over generalization error when overvalued • New domain-specific signal words are ignored in CDT

  14. 1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion Proposed Approach • Sentiment Search is more than just a classification problem • Detecting and Using the three signals • Dynamic Domain Adapting Classifiers • Generic Feature Detection using unlabeled data • Semantic Feature Spaces

  15. 1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion Dynamic Domain Adapting Classifiers • A (preferably domain-independent) model is built using computationally intense algorithms before query time on a set of labeled data. • Users interact at a query box level • Query results define the domain of interest • Domain specific adaptations are calculated • compares how the domain of interest is different from known cases • uses semantic knowledge about word senses and relations • must be fast algorithm: users are waiting • Domain specific adaptations are woven into the domain independent model • resulting model is temporary • used to classify documents as positive, negative, or objective • Sentimental search results are processed for significant components and presented for human consumption

  16. 1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion Query Business Intelligence Query Results Define a new Domain Lucene Index Labeled data from known domain Dynamic Domain Adapter Component Analysis Semantic Knowledge General Model Context Specific Model Sentiment Classifier Sentimental Search Results - + Overview Key: User Level, Source Data, Knowledge,Labeled Data Algorithms, Search Results

  17. Subjective Context Scoring • Multiply: • PMI(Word,Context) • IDF • Co-occurance with know generic sentiment seed words times their bias (From movie reviews) • Seeds: • bad,worst,stupid,ridiculous, terrible,poorly • great,best,perfect,wonderful, excellent,effective

  18. Rocchio Baseline • Rocchio - Query Expansion algorithm for search • Similar goals to ours, find more relevant words • Does not account for sentiment • The new query is a weight sum of • Matching document vectors • Query vector • Non-matching document vectors (negative value).

  19. Papa John’s According to TFIDF

  20. Papa John’s According to Subjective Context

  21. George Bush According to TFIDF

  22. George Bush According to Subjective Context

  23. iPod according to Rocchio

  24. iPod according to TFIDF Positive Sentiment In Movie Reviews Negative Sentiment in Movie Reviews

  25. Sentimental Context • Components: • PMI(Word,Context) • TF • IDF • Log( Actual Co Occur of Word,Seed, context / Prob by chance) • Values: • Abnormality to other docs • Popular words in context • Rare words in the corpus • Words that occur with sentiment words in the query documents

  26. iPod according to Sentimental Context

  27. iPod Nike according to Sentimental Context

  28. iPod+Nike According to Apple

  29. iPod Audio according to Sentiment Context

  30. iPod Shuffle According to Sentiment Context

  31. iPod Warranty According to Sentimental Context

  32. iPod Battery according to Sentiment Context

  33. iPod nano battery According to Sentimental Context

  34. Google Hits (Battery Related): • iPod battery good ~ 13.5 Mill • iPod battery bad ~ 900 K • iPod nano battery good ~ 3 Mill • iPod nano battery bad ~ 785 K • iPod shuffle battery good ~ 1.6 Mill • iPod shuffle battery bad ~ 230 K • iPod shuffle battery price good ~ 2.6 Mill (not a typo) • iPod shuffle battery price bad ~ 230 K • iPod battery price good ~ 13.5 Mill • iPod battery price bad ~ 850 K • iPod nano battery price good ~ 3 Mill • iPod nano battery price bad ~ 785 K

  35. 1. Intro - 2. Related Work - 3. Research Plan -4. Conclusion Summary • Interesting problem with many potential applications • Domain dependence is the core challenge • The keys to success are: • Vast quantities of unlabeled data • Semantic knowledge from freely available sources • Semantics must guide and influence but not overrule the statistics

  36. Questions?

  37. BACKUP SLIDES

  38. 1. Intro - 2. Related Work -3. Research Plan- 4. Conclusion PMI - Pointwise Mutual Information • a.k.a. Specific Mutual Information • Do 2 variables occur more often with each other than chance?

More Related