Learning Subjective Nouns using Extraction Pattern Bootstrapping

Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson Presenter: Gabriel Nicolae

Subjectivity – the Annotation Scheme • http://www.cs.pitt.edu/~wiebe/pubs/ardasummer02/ • Goal: to identify and characterize expressions of private states in a sentence. • Private state = opinions, evaluations, emotions and speculations. • Also judge the strength of each private state: low, medium, high, extreme. • Annotation gold standard: a sentence is • subjective if it contains at least one private-state expression of medium or higher strength • objective – all the rest The time has come, gentlemen, for Sharon, the assassin, to realize that injustice cannot last long.

Using Extraction Patterns to Learn Subjective Nouns – Meta-Bootstrapping (1/2) • (Riloff and Jones 1999) • Mutual bootstrapping: • Begin with a small set of seed words that represent a targeted semantic category • (e.g. begin with 10 words that represent LOCATIONS) • and an unannotated corpus. • Produce thousands of extraction patterns for the entire corpus (e.g. “<subject> was hired”) • Compute a score for each pattern based on the number of seed words among its extractions • Select the best pattern, all of its extracted noun phrases are labeled as the target semantic category • Re-score extraction patterns (original seed words + newly labeled words)

Using Extraction Patterns to Learn Subjective Nouns – Meta-Bootstrapping (2/2) • Meta-bootstrapping: • After the normal bootstrapping • all nouns that were put into the semantic dictionary are reevaluated • each noun is assigned a score based on how many different patterns extracted it. • only the 5 best nouns are allowed to remain in the dictionary; the others are discarded • restart mutual bootstrapping

Using Extraction Patterns to Learn Subjective Nouns – Basilisk • (Thelen and Riloff 2002) • Begin with • an unannotated text corpus and • a small set of seed words for a semantic category • Bootstrapping: • Basilisk automatically generates a set of extraction patterns for the corpus and scores each pattern based upon the number of seed words among its extractions  best patterns in the Pattern Pool. • All nouns extracted by a pattern in the Pattern Pool  Candidate Word Pool. Basilisk scores each noun based upon the set of patterns that extracted it and their collective association with the seed words. • The top 10 nouns are labeled as the targeted semantic class and are added to the dictionary. • Repeat bootstrapping process.

Using Extraction Patterns to Learn Subjective Nouns – Experimental Results • The graph tracks the accuracy as bootstrapping progressed. • Accuracy was high during the initial iterations but tapered off as the bootstrapping continued. After 20 words, both algorithms were 95% accurate. After 100 words, Basilisk was 75% accurate and MetaBoot 81%. After 1000 words, MetaBoot 28% and Basilisk 53%.

Creating Subjectivity Classifiers – Subjective Noun Features • Naïve Bayes classifier using the nouns as features. Sets: • BA-Strong: the set of StrongSubjective nouns generated by Basilisk • BA-Weak: the set of WeakSubjective nouns generated by Basilisk • MB-Strong: the set of StrongSubjective nouns generated by Meta-Bootstrapping • MB-Weak: the set of WeakSubjective nouns generated by Meta-Bootstrapping • For each set – a three-valuedfeature: • presence of 0, 1, ≥2 words from that set

Creating Subjectivity Classifiers – Previously Established Features • (Wiebe, Bruce, O’Hara 1999) • Sets: • a set of stems positively correlated with the subjective training examples – subjStems • a set of stems positively correlated with the objective training examples – objStems • For each set – a three-valuedfeature • the presence of 0, 1, ≥2 members of the set. • A binary feature for each: • presence in the sentence of a pronoun, adjective, cardinal number, modal other than will, adverb other than not. • Other features from other researchers.

Creating Subjectivity Classifiers – Discourse Features subjClues = all sets defined before except objStems • Four features: • ClueRatesubjfor the previous and following sentences • ClueRateobjfor the previous and following sentences • Feature for sentence length.

Creating Subjectivity Classifiers –Classification Results • The results of Naïve Bayes classifiers trained with different combinations of features. • Using both WBO and SubjNoun achieves better performance than either one alone. • The best results are achieved with all the features combined. • Another classification, with a higher precision, can be obtained by classifying a sentence as subjective if it contains any of the StrongSubjective nouns. • 87% precision • 26% recall

Learning Subjective Nouns using Extraction Pattern Bootstrapping