250 likes | 266 Views
A comprehensive lexico-semantic system for Polish with a rich set of relations and the largest wordnet in the world. It includes sentiment annotation for around 30,000 lexical units and is available as a web page, web service, mobile app, and downloadable files.
E N D
A Large Wordnet-based Sentiment Lexicon for Polish Monika Zaśko-Zielińska University of Wrocław Maciej Piasecki Wrocław University of Technology G4.19 Research Group Stanisław Szpakowicz University of Ottawa
RANLP 2015Hissar2015-09-07 CLARIN-PL Background plWordNet 2.3 • a comprehensive description of the lexico-semantic system of Polish • a rich set of lexico-semantic relations described • the largest wordnet in the world 171 000 lemmas 244 000 word senses 184 000 synsets • most of it (150 000 synsets) manually mapped to Princeton WordNet 3.1 • mapped to SUMO • includes enWordNet 0.1 — WordNet 3.1 extended with 8 000 lemmas and 9 000 word senses • everything is available on open licences
RANLP 2015Hissar2015-09-07 CLARIN-PL Background Effect • available as: a Web page, Web service, mobile application, files to download, WordnetLoom Viewer desktop application • hundreds of thousands of Web visits • > 700 registered users • scientific, commercial, educational • > 100 citations • WoSeDon – a Word Sense Disambiguation tool of practical accuracy One, but significant limitation • lack of support for the crucially important area of sentiment analysis and opinion mining
RANLP 2015Hissar2015-09-07 CLARIN-PL Assumptions and Goal Assumptions • wordnets are reference resources, relied upon for the absence of lexical errors • automatic annotation of lexical material is not a viable option • plWordNet is much too large for complete, affordable manual annotation Goal • to annotate manually a large portion of plWordNet with sentiment and basic emotions: around 30 000 lexical units,15% (a scale several times larger than SentiWordNet) • as a pilot study • and to build a core for further automated expansion
RANLP 2015Hissar2015-09-07 CLARIN-PL Annotation Model: Principles The basis: plWordNet • not based on a translation from Princeton WordNet • lexical units (lemma plus sense ID) are basic building blocks • a synset groups lexical units which share relations • glosses and examples assigned to lexical units Lexical units are a natural place for sentiment and emotion annotation Speaker perspective for annotation • it is difficult to describe sentiment polarity in isolation from context • too many factors govern sentiment perception • focus on the word sense intended by the speaker and the intended sentiment polarity • emotions which typify the source of the polarity
RANLP 2015Hissar2015-09-07 CLARIN-PL Annotation Model: Attributes • Sentiment polarity • Basic emotions • Fundamental values
RANLP 2015Hissar2015-09-07 CLARIN-PL Annotation Model: Attributes • Sentiment polarity • neutral versus positive or negative • strong versus weak • => +strong +weak, neutral, -weak, -strong • Basic emotions • the set of basic emotions proposed by Plutchik (1980): joy, trust, fear, surprise, sadness, disgust, anger, anticipation • compatibility • this set is used in much current research • applied in several resources, e.g., the NRC emotion lexicon (Mohammad and Turney, 2013) and the SentiSense Affective Lexicon (Carrillo de Albornoz et al., 2012)
RANLP 2015Hissar2015-09-07 CLARIN-PL Annotation Model: Attributes Fundamental human values Polish linguistic tradition: description of basic emotions is often associated with fundamental values, e.g.,(Puzynina, 1992) • użyteczność `utility’ • dobro drugiego człowieka `another's good’ • prawda `truth’ • wiedza `knowledge’ • piękno `beauty’ • szczęście `happiness' (all of them positive) • nieużyteczność `futility’ • krzywda `harm’ • niewiedza `ignorance’ • błąd `error', • brzydota `ugliness', • nieszczęście `misfortune' (all negative)
RANLP 2015Hissar2015-09-07 CLARIN-PL Annotation Procedure • Decide if x is marked with respect to sentiment polarity, or neutral; if x is neutral, skip the remaining steps • Assign the basic emotions and fundamental human values which appear to be associated with x • Mark x as negative, positive or ambiguous • Evaluate the intensity of x's sentiment polarity: strong or weak • Give example sentences: • one for x with a positive or negative polarity, • two for an ambiguous x.
RANLP 2015Hissar2015-09-07 CLARIN-PL Procedure: Markedness Procedure 1 • implicit • e.g., names of emotional states – can be recognised without referring to context • explicit • motivated by form or meaning • co-occurence in corpus with (Markowski, 1992): • deictic and possessive pronouns and operators which specify markedness, for example: *pomyśl o… ‘please think of…’starociu ‘a relic’ pomyśl o… ‘please think of…’tym naszym starociu ‘this relic of ours’ • concreteness test (Markowski, 1992) • modification by the pronouns ten ‘this, the’, taki ‘such, such as’, twój ‘yourpossessive’ and jakiś ‘some, areferential, one’
RANLP 2015Hissar2015-09-07 CLARIN-PL Procedure: Markedness Procedure 2 • the presence of pragmatic elements in the wordnet glosses for the analysed lexical units and in their definitions in various dictionaries. • the presence of qualifiers for genres in the wordnet glosses of the analysed lexical units . Result: neutral versus marked
RANLP 2015Hissar2015-09-07 CLARIN-PL Procedure: Emotions and Values • A supplementary step in relation to sentiment • Assignment of fundamental values was optional but rarely omitted • Complex emotions expressed by combinations of the basic ones • Perfect agreement was not expected, but the overlap was very high, e.g., • A1: {smutek ‘sadness’, wstręt ‘disgust’}; {nieużyteczność ‘futility’, niewiedza ‘ignorance’} • A2: {smutek ‘sadness’, złość ‘anger’, wstręt ‘disgust’}; {nieużyteczność ‘futility’, niewiedza ‘ignorance’}
RANLP 2015Hissar2015-09-07 CLARIN-PL Procedure: Sentiment polarity • Congruence test • Discord test • Test of collocations • Test of dictionary definitions Congruence test • all occurrences of the given lexical unit x (not a lemma/word) in the usage examples to have the same sentiment polarity as that considered for x • co-occurring adjectives, nouns and verbs do not change the polarity value, but support it • diverse examples for the value ambiguous
RANLP 2015Hissar2015-09-07 CLARIN-PL Procedure: Sentiment polarity Discord test • the presence of a proper antonymy link between the lexical unit considered and some other lexical units with clear sentiment polarity • e.g., nadzieja‘hope’ [positive] – rozczarowanie‘disappointment’ Test of collocations • words included in collocations for the given lexical units are examined with respect to their sentiment polarity • the strength of the observed tendency
RANLP 2015Hissar2015-09-07 CLARIN-PL Procedure: Sentiment polarity Test of dictionary definitions • checking if all components of the definition (definition parts) are clearly positive, negative or mixed • e.g., positive • szatan ‘devil’ – z podziwem o człowieku bardzo zdolnym, sprytnym, odważnym ‘admiringly about someone very capable, canny, courageous’ [plWordNet gloss] • e.g., negative • bubek ‘a kind of ass and upstart’ – z niechęcią o mężczyźnie mało wartym, ale mającym wygórowane mniemanie o sobie‘with dislike about a man worth little but with an excessively high opinion of himself’
RANLP 2015Hissar2015-09-07 CLARIN-PL Procedure: Sentiment strength • Distance on an intensity scale between a given lexical unit and basic emotions assigned to it e.g.,niezadowolenie ‘dissatisfaction’ vs {smutek ‘saddness’ and złość ‘pique’} => full description => strong • Comparison of a given lexical unit with another one with a similar meaning. • If the given lexical unit seems to have negative polarity but it is used to characterise a child humorously, we assign it weak polarity • Time drift: the contemporary state and the contemporary polarity of lexical units is described
RANLP 2015Hissar2015-09-07 CLARIN-PL Procedure: Usage Examples • Double role • illustration: the annotations and the related aspects of the lexical unit’s meaning, and they • verification: the earlier decisions • Selected or created • Focused on • frequent collocations of the LU under consideration • unofficial situations that are not frequent in dictionaries
RANLP 2015Hissar2015-09-07 CLARIN-PL Process • Team • six annotators, 3 linguists and 3 psychologists • one super-annotator • verification: the earlier decisions • Organisation of the work • two annotations per lexical unit – mixed pairs • first annotator: assigns basic emotions, fundamental values, sentiment polarity values and examples • second annotator: • processes the same lexical unit independently • reviews the result of the first one • can adjust his decisions
RANLP 2015Hissar2015-09-07 CLARIN-PL Process • Organisation of the work • if the second annotator disagreed, a report went to the coordinator • if the coordinator found an annotator’s error, a re-analysis was requested. • if potential error is found in plWordNet, the annotation of the given lexical unit is suspended till the error has been corrected • Scope • Nouns • hypernymy subgraphs, more significant for sentiment polarity • affect, feelings and emotions, people, features of people and animals, events rated negatively, evaluated as negative • Selected adjectives
RANLP 2015Hissar2015-09-07 CLARIN-PL plWordNet 2.3-emo • Annotation coverage • 27% of adjectival LUs • 12% of noun LUs • the domains most likely to include LUs with non-neutral sentiment polarity • Statistics
RANLP 2015Hissar2015-09-07 CLARIN-PL plWordNet 2.3-emo: Examples • Ambiguous: {starzec 1, staruszek 1, dziad 1, dziadek 1} (per) gloss: • ##A1: {zaufanie, smutek, złość; dobro, wiedza, nieużyteczność, nieszczęście} amb[Chętnie pomagam temu starcowi, ponieważ zawsze opowiada mi niezwykłe historie z lat swej młodości.] [Ten starzec wyglądał coraz gorzej, było mi go żal.] • ##A2: {zaufanie, smutek, wstręt; wiedza, nieużyteczność, brzydota} amb[W pierwszym rzędzie, tuż przed ołtarzem, zasiadł nobliwy starzec - gość biskupa.][Jadwiga szukała sposobu, jak może pomóc sponiewieranemu, ubogiemu starcowi.]
RANLP 2015Hissar2015-09-07 CLARIN-PL plWordNet 2.3-emo: Examples • Ambiguous: {starzec 1, staruszek 1, dziad 1, dziadek 1} (per), gloss: “stary mężczyzna” `old man’ • ##A1: {złość, wstręt; nieużyteczność, niewiedza} - m(`-strong’)[Stary dziad nie powinien podrywać młodych dziewczyn.] • ##A2: {wstręt; nieużyteczność, brzydota} - s(`-weak’)[Jakiś dziad się dosiadł do naszego przedziału i wyciągnął śmierdzące kanapki z jajkiem.] • ##A3: {wstręt; nieużyteczność, brzydota} - s(`-weak’)[Kilkanaście lat minęło i zrobił się z niego stary dziad.]
RANLP 2015Hissar2015-09-07 CLARIN-PL Inter-annotator Agreement • The second annotator did not look at the first annotator’s decision before having her own • Only in the case of evident errors did the coordinator ask the annotators to analyse the meaning of the given lexical unit and to rethink the decision • All final decisions stored • Fleiss’ κ (Fleiss, 1971) higher agreement for adjectives due to the previous practice with annotating nouns
RANLP 2015Hissar2015-09-07 CLARIN-PL Conclusions • A first, important step towards sentiment annotation of the whole plWordNet. • The achieved size is very high in comparison to other manual annotation projects. • Good starting point for algorithms of spreading the sentiment polarity • Because plWordNet was developed independently from Princeton WordNet, it might be interesting to compare our annotation with the automatic annotation added to PWN in other projects
Thank you very much for your attention! http://www.clarin-pl.eu Supported by the Polish Ministry of Science and Higher Education [CLARIN-PL]