Document-level Semantic Orientation and Argumentation

Document-level Semantic Orientation and Argumentation Presented by Marta Tatu CS7301 March 15, 2005

 or ? Semantic Orientation Applied to Unsupervised Classification of Reviews Peter D. Turney ACL-2002

Overview • Unsupervised learning algorithm for classifying reviews as recommended or not recommended • The classification is based on the semantic orientation of the phrases in the review which contain adjectives and adverbs

Algorithm Input: review • Identify phrases that contain adjectives or adverbsby using a part-of-speech tagger • Estimate the semantic orientation of each phrase • Assign a class to the given review based on the average semantic orientation of its phrases Output: classification ( or )

Step 1 • Apply Brill’s part-of-speech tagger on the review • Adjective are good indicators of subjective sentences. In isolation: • unpredictable steering () / plot () • Extract two consecutive words: one is an adjective or adverb, the other provides the context

Step 2 • Estimate the semantic orientation of the extracted phrases using PMI-IR (Turney, 2001) • Pointwise Mutual Information (Church and Hanks, 1989): • Semantic Orientation: • PMI-IR estimates PMI by issuing queries to a search engine (Altavista, ~350 million pages)

Step 2 – continued • Added 0.01 to hits to avoid division by zero • If hits(phrase NEAR “excellent”) and hits(phrase NEAR “poor”)≤4, then eliminate phrase • Added “AND (NOT host:epinions)” to the queries not to include the Epinions website

Step 3 • Calculate the averagesemantic orientation of the phrases in the given review • If the average is positive, then  • If the average is negative, then 

Experiments • 410 reviews from Epinions • 170 (41%) () • 240 (59%) () • Average phrases per review: 26 • Baseline accuracy: 59%

Discussion • What makes the movies hard to classify? • The average SO tends to classify a recommended movies as not recommended • Evil characters make good movies • The whole is not necessarily the sum of the parts • Good beaches do not necessarily add up to a good vacation • But good automobile parts usually add up to a good automobile

Applications • Summary statistics for search engines • Summarization of reviews • Pick out the sentence with the highest positive/negative semantic orientation given a positive/negative review • Filtering “flames” for newsgroups • When the semantic orientation drops below a threshold, the message might be a potential flame

Questions ? • Comments ? • Observations ?

? Sentiment Classification using Machine Learning Techniques Bo Pang, Lillian Lee and Shivakumar Vaithyanathan EMNLP-2002

Overview • Consider the problem of classifying documents by overall sentiment • Three machine learning methods besides the human-generated lists of words • Naïve Bayes • Maximum Entropy • Support Vector Machines

Experimental Data • Movie-review domain • Source: Internet Movie Database (IMDb) • Stars or numerical value ratings converted into positive, negative, or neutral » no need to hand label the data for training or testing • Maximum of 20 reviews/author/sentiment category • 752 negative reviews • 1301 positive reviews • 144 reviewers

List of Words Baseline • Maybe there are certain words that people tend to use to express strong sentiments • Classification done by counting the number of positive and negative words in the document • Random-choice baseline: 50%

Machine Learning Methods • Bag-of-features framework: • {f1,…,fm} predefined set of m features • ni(d) = number of times fi occurs in document d • (Naïve Bayes)

Machine Learning Methods – continued • (Maximum Entropy) where Fi,c is a feature/class function: • Support vector machines: Find hyperplane that maximizes the margin. The constraint optimization problem: • cj is the correct class of document dj

Evaluation • 700 positive-sentiment and 700 negative-sentiment documents • 3 equal-sized folds • The tag “NOT_” was added to every word between a negation word (“not”, “isn’t”, “didn’t”) and the first punctuation mark • “good” is opposite to “not very good” • Features: • 16,165 unigrams appearing at least 4 times in the 1400-document corpus • 16,165 most often occurring bigrams in the same data

Results • POS information added to differentiate between: “I love this movie” and “This is a love story”

Conclusion • Results produced by the machine learning techniques are better than the human-generated baselines • SVMs tend to do the best • Unigram presence information is the most effective • Frequency vs. presence: “thwarted expectation”, many words indicative of the opposite sentiment to that of the entire review • Some form of discourse analysis is necessary

Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status Simone Teufel and Marc Moens CL-2002

Overview • Summarization of scientific articles: restore the discourse context of extracted material by adding the rhetorical status of each sentence in the document • Gold standard data for summaries consisting of computational linguistics articles annotated with the rhetorical status and relevance for each sentence • Supervised learning algorithm which classifies sentences into 7 rhetorical categories

Why? • Knowledge about the rhetorical status of the sentence enables the tailoring of the summaries according to user’s expertise and task • Nonexpert summary: background information and the general purpose of the paper • Expert summary: no background, instead differences between this approach and similar ones • Contrasts or complementarity among articles can be expressed

Rhetorical Status • Generalizations about the nature of scientific texts + information to enable the construction of better summaries • Problem structure: problems (research goals), solutions (methods), and results • Intellectual attribution: what the new contribution is, as opposed to previous work and background (generally accepted statements) • Scientific argumentation • Attitude toward other people’s work: rival approach, prior approach with a fault, or an approach contributing parts of the authors’ own solution

Metadiscourse and Agentivity • Metadiscourse is an aspect of scientific argumentation and a way of expressing attitude toward previous work • “we argue that”, “in contrast to common belief, we” • Agent roles in argumentation: rivals, contributors of part of the solution (they), the entire research community, or the authors of the paper (we)

Citations and Relatedness • Just knowing that an article cites another is often not enough • One needs to read the context of the citation to understand the relation between the articles • Article cited negatively or contrastively • Article cited positively or in which the authors state that their own work originates from the cited work

Rhetorical Annotation Scheme • Only one category assigned to each full sentence • Nonoverlapping, nonhierarchical scheme • The rhetorical status is determined on the basis of the global context of the paper

Relevance • Select important content from text • Highly subjective » low human agreement • Sentence is considered relevant if it describes the research goal or states a difference with a rival approach • Other definitions: relevant sentence if it shows a high level of similarity with a sentence in the abstract

Corpus • 80 conference articles • Association for Computational Linguistics (ACL) • European Chapter of the Association for Computational Linguistics (EACL) • Applied Natural Language Processing (ANLP) • International Joint Conference on Artificial Intelligence (IJCAI) • International Conference on Computational Linguistics (COLING). • XML markups added

The Gold Standard • 3 tasked-trained annotators • 17 pages of guidelines • 20 hours of training • No communication between annotators • Evaluation measures of the annotation: • Stability • Reproducibility

Results of Annotation • Kappa coefficient K(Siegel and Castellan, 1988) where P(A)= pairwise agreement and P(E)= random agreement • Stability: K=.82, .81, .76(N=1,220 and k=2) • Reproducibility: K=.71

The System • Supervised machine learning Naïve Bayes

Features • Absolute location of a sentence • Limitations of the author’s own method can be expected to be found toward the end, while limitations of other researchers’ work are discussed in the introduction

Features – continued • Section structure: relative and absolute position of sentence within section: • First, last, second or third, second-last or third-last, or either somewhere in the first, second, or last third of the section • Paragraph structure: relative position of sentence within a paragraph • Initial, medial, or final

Features – continued • Headlines: type of headline of current section • Introduction, Implementation, Example, Conclusion, Result, Evaluation, Solution, Experiment, Discussion, Method, Problems, Related Work, Data, Further Work, Problem Statement, or Non-Prototypical • Sentence length • Longer or shorter than 12 words (threshold)

Features – continued • Title word contents: does the sentence contain words also occurring in the title? • TF*IDF word contents • High values to words that occur frequently in one document, but rarely in the overall collection of documents • Do the 18 highest-scoring TF*IDF words belong to the sentence? • Verb syntax: voice, tense, and modal linguistic features

Features – continued • Citation • Citation (self), citation (other), author name, or none + location of the citation in the sentence (beginning, middle, or end) • History: most probable previous category • AIM tends to follow CONTRAST • Calculated as a second pass process during training

Features – continued • Formulaic expressions: list of phrases described by regular expressions, divided into 18 classes, comprising a total of 644 patterns • Clustering prevents data sparseness

Features – continued • Agent: 13 types, 167 patterns • The placeholder WORK_NOUN can be replaced by a set of 37 nouns including theory, method, prototype, algorithm • Agent classes with a distribution very similar with the overall distribution of target categories were excluded

Features – continued • Action: 365 verbs clustered into 20 classes based on semantic concepts such as similarity, contrast • PRESENTATION_ACTIONs: present, report, state • RESEARCH_ACTIONs: analyze, conduct, define, and observe • Negation is considered

System Evaluation • 10-fold-cross-validation

Feature Impact • The most distinctive single feature is Location, followed by SegAgent, Citations, Headlines, Agent and Formulaic

Thank You !

Document-level Semantic Orientation and Argumentation