200 likes | 411 Views
Deeper Sentiment Analysis Using Machine Translation Technology. Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004. abstract. This paper proposes a new paradigm for sentiment analysis : translation from text documents to a set of sentiment units.
E N D
Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004
abstract • This paper proposes a new paradigm for sentiment analysis : translation from text documents to a set of sentiment units. • Making use of an existing transfer-based machine translation engine.
introduction • Sentiment analysis (SA) is a task to obtain someone’s feelings as expressed in positive or negative comments (favorable or unfavorable), questions, and requests. • SA is becoming a useful tool for the commercial activities. • This paper describes a method to extract a set of sentiment units from sentences, which is the key component of SA.
introduction • A sentiment unit is a tuple of a sentiment, a predicate, and its arguments. • It has excellent lens, but the price is too high. I don’t think the quality of the recharger has any problem. [favorable] excellent (lens) [unfavorable] high (price) [favorable] problematic+neg (recharger) • Three sentiment units indicate that the camera has good features in its lens and recharger, and a bad feature in its price. • The extraction of these sentiment units is not a trivial task because many syntactic and semantic operations are required. • A sentiment unit should be constructed as the smallest possible informative unit so that it is easy to handle for the organizing processes after extraction.
introduction • Implemented an accuratesentiment analyzer by making use of an existingtransfer-based machine translation engine (Watanabe,1992), replacing the translation patterns andbilingual lexicons with sentiment patterns and a sentimentpolarity lexicon. • Use deep analysis techniques such as those used for machine translation where all of the syntactic and semantic phenomena must be handled.
introduction • our SA system attaches importance to each individual sentiment expression, rather than to the quantitative tendencies of reputation.
Sentiment Unit • A predicate is a word, typically a verb or an adjective, which conveys the main notion of the sentiment unit. • An argument is also a word, typically a noun, which modifies the predicate with a case postpositional in Japanese. They roughly correspond to a subject and an object of the predicate in English. • For example, the sentence,”ABC123 has an excellent lens”. [fav] excellent <ABC123, lens>
Sentiment Unit • Semantically similar representations should be aggregated to organize extracted sentiments. • Predicates may have features, such as negation, facility, difficulty, etc. • “ABC123 doesn’t have an excellent lens.” • [unf] excellent + neg <ABC123, lens> • Easy to break. [unf] break + facil • Difficult to learn [unf] learn + diff • The surface string is the corresponding part in the original text. It is used for reference in the view of the output of SA.
Implementation :Transfer-based Machine Translation Engine • the transfer-based machine translation system consists of three parts: • a source language syntactic parser, • a bilingual transfer which handles the syntactic tree structures, • a target language generator.
Techniques Required for Sentiment Analysis • Full syntactic parsing plays an important role to extract sentiments correctly, because only by a shallow parser are not always reliable. For example, expressions such as “I don’t think X is good”, is not favorable opinions about X, even though “X is good” appears on the surface. Therefore we use top-down pattern matching on the tree structures from the full parsing in order to find each sentiment fragment. • In our method, initially the top node is examined to see whether or not the node and its combination of children nodes match with one of the patterns in the pattern repository. In this top-down manner, the nodes “don’t think” in the above examples are examined before “X is good
Techniques Required for Sentiment Analysis • There are three types of patterns: • principal patterns, • The pattern converts a Japanese expression “ noun ga warui” to a sentiment unit “[unf] bad <noun>”. • The pattern converts an expression “ noun wo ki-ni iru” to a sentiment unit “[fav] like <noun>”
Techniques Required for Sentiment Analysis • auxiliary patterns • expands the scope of matching. • The pattern matches with phrases such as “X-wa yoi-to omowa-nai. (I don’t think X is good.)” and produces a sentiment unit with the negation feature. When this pattern is attached to a principal pattern, its favorability is inverted. • nominal patterns • Using this pattern, convert a noun phrase “renzu-no shitsu (quality of the lens)” into just “lens”. • EX: The quality of the lens is good. • [fav] good <lens> ?[fav] good <quality> • Pattern used for compound nouns such as “junden jikan (researching time). A sentiment unit “long <time>” is not informative, but “long <recharging time> “ can be regarded as a [unf]sentiment.
Disambiguation of sentiment polarity • Some adjectives and verbs may be used for both favorable and unfavorable predicates. This variation of sentiment polarity can be disambiguated naturally in the same manner as the word sense disambiguation in machine translation. • The resolution is high fav • ABC123 is expensive unf • The semantic category assigned to a noun holds the information used for this type of disambiguation.
Resources • Principal patterns : verbal and adjectival, and assigned a sentiment polarity to each word. (total 3752 words) • Auxiliary/Nominal patterns: 95 auxiliary patterns and 36 nominal patterns were created manually. • Polarity lexicon: Some nouns were assigned sentiment polarity, e.g. [unf] for ‘noise’. (There are many ...)”. • Some patterns and lexicons are domain dependent. Fortunately the translation engine used here has a function to selectively use domain-dependent dictionaries, and thus we can prepare patterns which are especially suited for the domain of digital cameras.
Evaluation • Bulletin boards on the WWW that are discussing digital cameras. • A total of 200 randomly selected sentences were analyzed by our system. • The resources were created by looking at other part of the same domain texts.
Experiment 1 • See the reliability of the extracted sentiment polarity, use 3 metrics: Weak / Strong Precision, Recall • Using 2 method • (a) based on machine translation engine • (b) the lexicon-only method, which emulates the shallow parsing approach. • Use simple polarity lexicon of adjectives and verbs. • No disambiguation was done. • Direct negation of and adjective or verb.
Experiment 1 • The MT method outputs a sentiment unit only when the expression is reachable from the root node of the syntactic tree through the combination of sentiment fragments, while the lexicon-only method picks up sentiment units from any node in the syntactic tree. • The sentence is an example where the lexicon-only method output the wrong sentiment unit , while the MT method did not output this sentiment unit • gashitsu-ga kirei-da-to iu hyouka-ha uke-masen-deshi-ta. ‘There was no opinion that the picture was sharp.’ • [fav] clear <picture> In the lexicon-only method, • some errors occurred due to the ambiguity in sentiment polarity of an adjective or a verb, e.g. Capabilities are high.” since high/expensive is always assigned the [unf] feature.
Experiment 2 • Compare the scope of the extracted sentiment units between MT and (c): a method that support only naïve predicate-argument structures and doesn’t use nominal patterns. • The output by the MT was less redundant and more informative than Naïve method. • Ex: It seems the function was enhanced last may • (A) [fav] enhance <function, May> • (C) [fav] enhance <function> • Ex: A zoom is more desirable. • (A) [fav] desirable <hou> • (C) [fav] desirable <zoom>
conclusion • We have shown that the deep syntactic and semantic analysis makes possible the reliable extraction of sentiment units, and the outlining of sentiments became useful because of the aggregation of the variations in expressions, and the informative outputs of the arguments. • when we regard the extraction of sentiment units as a kind of translation. Many techniques which have been studied for the purpose of machine translation, such as word sense disambiguation, anaphora resolution, can accelerate the further enhancement of sentiment analysis.