250 likes | 392 Views
Contents Analytics Studio Sentiment as an example of an Interesting Annotator. Ken Nelson WW Solution Consultant. Requirements for Sentiment Analysis. Correctly detect positive and negative sentiment Happy, Angry Handle negated sentiment
E N D
Contents Analytics StudioSentiment as an example of an Interesting Annotator Ken Nelson WW Solution Consultant
Requirements for Sentiment Analysis • Correctly detect positive and negative sentiment • Happy, Angry • Handle negated sentiment • I am not happy, I will never be happy, I have never been less happy • Index the normal form of terms • Great, greater, greatest … no need to distinguish • Substitute “NOT” for the actual negation • The negation term or phrase probably doesn’t matter • Create annotations compatible with ICA V3 Sentiment • OpinionPhrase with sentimentMatch, sentimentTerm, polarity, and ruleName • Requires Fixpack 1
Create a sample document • The sample document name must end with .txt • It should contain enough examples to create and validate the model I am happy with the result. We were ANGRY about not being included. Negation I am not happy with that product I will never be HAPPY with that product I was appalled with the result I don't expect ever to be happy I am not angry I will never be angry Some cases don't follow the simple negation rule I have never been happier I have never been more pleased
Create Dictionaries • Positive and Negative terms • Normal plus surface forms • Many use cases should index only the normal / common form, map synonyms to a common term, handle spelling variations, etc • Languages where the ending / suffix of a term is used for negation will require a different approach than outlined here
Two approaches to the dictionaries • Separate dictionaries for Positive vs Negative • Since there will probably be additional dictionaries beyond simple terms, this might be the best choice • Single dictionary with a “Polarity” column • Any feature of an annotation can be used in Rules • Performance should be slightly better with a combined dictionary • For this example, a single dictionary will be used
Is there a source for dictionaries? • Sometimes dictionaries can be found online • Many sources of Sentiment dictionaries • Google search for free download german "sentiment dictionary" • We can’t use online sources without Legal approval • For interesting discussion and a source of data http://provalisresearch.com/wordstat/Sentiment-Analysis.html • Sometimes we can get data from other products • ICA V3 (CCI) Annotator is a possible source • configurations\indexservice\data\Sentiment\languages\en\dictionaries
ICA V3 Sentiment Dictionaries 1,741 AA.dict 7,127 Opinions-NegativeCompetence.dict 1,748 ADDRESS_KW.dict 4,699 Opinions-NegativeFeeling.dict 1,973 Adverb.dict 10,878 Opinions-NegativeFunctioning.dict 119,668 AlwaysNegatives.dict 5,598 Opinions-PositiveAttitude.dict 7,967 AlwaysNegatives_MWE.dict 2,979 Opinions-PositiveBudget.dict 57,747 AlwaysPositives.dict 12,817 Opinions-PositiveCompetence.dict 6,357 AlwaysPositives_MWE.dict 8,666 Opinions-PositiveFeeling.dict 1,663 AM_PM.dict 6,809 Opinions-PositiveFunctioning.dict 1,970 anaphoraDict_en.dict 53,272 Opinions-Uncertain.dict 1,730 Be.dict 1,914 ORDINAL.dict 3,588 Budget-Budget.dict 1,819 SentimentBlockerFilterDict.dict 1,893 CARD.dict 2,207 SentimentBlockersDictionary.dict 1,697 Core-Location.dict 3,765 SentimentBlockersDictionary_Phrases.dict 1,856 Core-Organization.dict 3,078 SentimentIntensifierDictionary.dict 1,701 Core-Person.dict 9,556 SentimentNegationTriggersDictionary.dict 1,674 Core-Product.dict 3,667 SentimentNegationTriggersDictionary_Verbs.dict 1,659 Core-Unknown.dict 1,651 Slang-Contextual.dict 1,838 Det.dict 1,882 Slang-Negative.dict 1,953 Emoticon-NegativeFeeling_Emoticon.dict 1,746 Slang-NegativeAttitude.dict 1,811 Emoticon-PositiveFeeling_Emoticon.dict 1,652 Slang-NegativeFunctioning.dict 1,734 FULL_MONTH.dict 1,899 Slang-Positive.dict 1,695 Have.dict 1,934 Slang-PositiveAttitude.dict 2,632 LatentSentimentModifiers_Less.dict 1,646 Slang-PositiveBudget.dict 2,911 LatentSentimentModifiers_More.dict 1,795 Slang-PositiveFeeling.dict 2,923 LatentSentiment_Less.dict 1,646 Slang-PositiveFunctioning.dict 2,733 LatentSentiment_More.dict 1,820 Slang-Uncertain.dict 2,336 LatentSentiment_Negation.dict 1,864 Slang-Unknown.dict 1,743 NotNounWords.dict 2,317 STATE.dict 4,562 Opinions-Contextual.dict 2,278 SupportNegPart.dict 6,649 Opinions-NegativeAttitude.dict 2,196 SupportWords.dict 4,511 Opinions-NegativeBudget.dict 2,933 TaggerDependentDictionary.dict 1,684 Variations-Unknown.dict
ICA V3 Sentiment Dictionaries – Always Positive abiding acclaim abidingly acclaimed abound acclaiming abounded acclaimly abounding acclaims abounds acclamation absolve acclamations absolved accolade absolvedly accolades absolvely accommodative absolves accommodatively absolving accomplish absorbing accomplished absorbingly accomplishedly abundance accomplishes abundances accomplishing abundant accomplishly abundantly accomplishment accedely accomplishments acceptable accordance acceptably accountable accessible accountably accessiblely accurate • Are these really positive terms? • Are they ALWAYS positive as implied by the file names? (NO) • The Always Positive and Negative dictionaries did a reasonable job with the Consumer Review data • Better accuracy would be possible by converting additional ICA V3 dictionaries into Studio dictionaries
Verify the result • After creating dictionaries, adding the jar(s) to the pipeline, and building resources, analyze the sample document
Negation • Drag a negation phrase with one intervening token into the Rule Builder • Change number of occurrences and feature type for the token • Check the polarity feature to limit selections to Positive (or Negative)
Insert the Annotation • Select the Sentiment term, then Insert Annotation
Add features to the annotation • Create a new feature using Normalization • ConvertToLowerCase on _coveredText • Ideally, the lemma would be used but there is a problem with the Studio
Avoid redundancy • In most cases, you should remove the annotation consumed by the rule. • The terms not consumed by the rule will remain • The remaining terms will be converted to OpinionPhrase • Studio bug: The Sentiment annotation won’t be removed if the lemma is used in the new feature, but will be removed if Covered Text is used • If this worked, the Negation rules could produce OpinionPhrase directly
Create OpinionPhrase for non-negated terms • Annotations for terms that were negated were removed by the rule • The remainder should converted to OpinionPhrase • Drag a Sentiment term to the Builder • On the Annotation tab, create OpinionPhrase (without com.etc) • Drag the lemma to the Feature to create sentimentTerm and sentimentMatch • Drag the polarity to the feature • Create a string feature for ruleName (the property is required, but any value can be used) • Delete the Sentiment annotation (This doesn’t work, but is a good practice)
Convert Negated sentiment to OpinionPhrase • Drag negated sentiment to Builder • Insert annotation as OpinionPhrase • Add a feature for “NOT” • Drag the lower case term over Feature • Concatenate “NOT” with the term • Add polarity value and ruleName • Studio bug? • When creating a feature with an existing feature name, an error is sometimes displayed even before the type is selected • After creating the feature, it can be renamed
Cleanup • Remove annotations that are not needed in the index
Disable built-in Sentiment Annotator • Enable custom Sentiment Annotator instead of System T Annotator copy ES_NODE_ROOT/master_config/<collectionID>.indexservice/specifiers/lexical/NullAnnotator.xml ES_NODE_ROOT/master_config/<collectionID>.indexservice/specifiers/analytics/Sentiment.xml • Be sure to save the original Sentiment.xml • Fixpack 1 is required • Export to ICA (no index field or facet mapping is required) • Rebuild the index
Refine the model • “I cannot believe the poor drying performance …” “Won’t open” • Performance is not a sentiment, so not negated • Should negation terms not applied to sentiment be treated as negative sentiment? • How would you pick up the term(s) being modified? • “I have never been happier” • Can comparators (faster, bigger, hotter) be negated? • “I have never been more happy” • How would you handle this type of statement? • “I have never been less happy” • Does reducing the degree of a term make it negative? (Always?) • less, lessen, reduce, decrease, minimize, fewer….. • Domain specific terms • Agitator is not negative when associated with washing machines • Broad coverage vs confidence • V3 annotator has many terms that might not belong in a Sentiment dictionary • abound, absolve, absorbing, abundant, acceptable
Refine the Model • Specific phrases might be important to cover text that can’t be handled by simple dictionaries and rules, or for some use cases • Would you handle these with dictionaries (resolve x, address x) plus rules, or with a phrase dictionary? Remember that these can be negated! able to address all of my questions address my issues able to get them worked out address my needs able to resolve address our needs able to resolve my issue address the issue able to resolve my issues address your issues able to resolve my problem addressed my concerns able to resolve my problems addressed my issue able to resolve the issue addressed my issues able to resolve the problem addressed my problem able to resolve the problems addressed properly
Refine the Model • Are there other challenging phrases or use of language? • Sarcasm? • “Yeah, right” -- Probably negative • “That’s just great” – You can only know based on the context and maybe it isn’t possible with written text. • Slang is common, and probably not included in many dictionaries • Slang also evolves over time • Non-textual Sentiment • ;) :)
Lab Assigment • Create a Sentiment annotator • Handle “always negative” or “always positive” terms • Handle negation blocking terms or phrases • Handle unassigned negation terms (if you think that is appropriate) • If you used English, export to the HappyHome collection and rebuild the index • Be ready to discuss your approach and results