150 likes | 292 Views
Identifying Expressions of Opinion in Context. Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007. Introduction. Traditional information extraction: answer questions about facts Extract answers to subjective questions: how does X feel about Y?
E N D
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007
Introduction • Traditional information extraction: answer questions about facts • Extract answers to subjective questions: how does X feel about Y? • Subjective information extraction and question answering will require techniques to analyze text below the sentence level
Introduction: System Requirement • Is its polarity positive, negative, or neutral? • With what strength or intensity is the opinion expressed: mild, medium, strong or extreme? • Who or what is the source, or holder, of the opinion? • What is its target, i.e. what is the opinion about?
Introduction: Examples • Minister Vedrine criticizedthe White House reaction. • the agent role = “Minister Vedrine” • the object/theme role = “White House reaction” • 17 persons were killed by sharpshooters faithful tothe president. • Tsvangirai saidthe election result was “illegitimate” and a clear case of “highway robbery”. • Criminals have been preying on Korean travelers in China.
Introduction • Direct subjective expressions (DSEs) • criticized, faithful to • Said (speech event, if subjective) • Expressive subjective elements (ESEs) • illegitimate, highway robbery • preying on (instead of mugging) • None has directly tackled the problem of opinion expression identification.
Subjective Expressions • The expressions can vary in length from one word to over twenty words. • They may be verb phrases, noun phrases, or strings of words that do not correspond to any linguistic constituent. • Subjectivity is a realm of expression where writers get quite creative, so no short fixed list can capture all expressions of interest. • Also, an expression which is subjective in one context is not always subjective in another context.
Approach • This task is treated as a tagging problem. • Conditional random field • Class variable • IOB vs IO • Features • A linear-chain conditional random field is chosen, using MALLET toolkit.
Features (1) • Lexical features • The word at position i relative to the current token. • Lex-4 ~ Lex4, , 18,000 binary features per position (vocabulary size) • Syntactic features • POS (45 binary features) • prev, cur, next (CASS partial parser, constituent type), 100 binary features each. • Dictionary-based features
Features (2) • Dictionary-based features: 4 sources • WordNet: WordNet hypernyms (29,989 binary features) • Levin: Levin’s categorization of English words • Framenet: word in the categorization of nouns and verbs in Framenet • Wilson clues (subjective): strong or weak (two binary features)
Statistics of Data MPQA corpus, 535 documents. 135 for training, 400 for testing. 10-fold cross validation
Evaluation • Metric: Precision/Recall/F-measure • Exact • Overlap • Baselines: dictionary-based • two dictionaries of subjectivity clues: Wiebe vs. Wilson • Wilson is incorporated in this experiment
Results (Dictionary-based) • WordNet is the most useful • The other dictionaries only help a little
Discussion • Rules of boundary agreement is not defined for the annotations: order 1 outperform order 0 • DSEs includes speech events like “said” or “a statement”, which may be objective. • Expressions of subjectivity tend to cluster, therefore density-based features might help. • Inter-annotator agreement of DSE: 0.75; ESE:0.72