Improving Stance Classification in Ideological Debates

Stance Classification of Ideological Debates Sen Han leonihsam@gmail.com 4. June 2019 1

Outline • Abstract • Problem • Introduction • Previous approach • Improved approach (for this paper) • Improvements in stance classification • Models • Features • Data • Constraints • Experiments and Evaluation • Results • Discussion 21 Sen Han 2

Problem • Determining the stance expressed in a post written for a two-sided debate in an online debate forum is a relatively new and challenging problem in opinion mining. • Improve the performance of a learning-based stance classification in different dimensions 21 Sen Han 3

Previous • “Should homosexual marriage be legal ?” • The goal of debate stance classification is to determine which of the two sides (i.e., for and against) its author is taking • But • colorful and emotional language to express one’s points,which may involve sarcasm, insults, and questioning another debater’s assumptions and evidence. (spam,disturbance term) • Limited stance-annotated debate 21 Sen Han 4

Improvement • Data: Increase the number of stance-annotated debate posts from different sources for training • Features: Addsemantic features on an n-gram-based stance classifier • Models:Exploitethe linear structure inherent in a post sequence, train a better model by learning only from the stance-related sentences without relying on sentences manually annotated with stance labels • Constraints: Extra-linguistic inter-post constraints, such as author constraints by postprocessing output of a stance classifier 21 Sen Han 5

Models • Binary classifier • Naive Bayes (NB) • Support Vector Machines (SVMs) • Sequence labelers • first-order Hidden Markov Models (HMMs) • linear-chain Conditional Random Fields (CRFs) • Our model • unigram • fine-grained models. • stance label of a debate post and the stance label of each of its sentences 21 Sen Han 6

Fine-grained model • Document • di • Adocument stance c with probability • P(c) • Sentence • em • Asentence stance s with probability • P(s|c) • N-thfeature representing em: • fn,with probability P(fn|s,c) • Sentence stance • P(s|em,di,c) 21 Sen Han 7

Fine-griand Model • Classify each test post di using fine-grained NB • Maximum conditional probability • S_max • Set of sentences in test post di • S(di) • E.g • p(“for homosexual marriage”|d1)=80% • p(“for abortion”| d2)=5% 21 Sen Han 8

Features • N-gram features • unigrams and bigrams collected from the training posts • Anand et al.’s (2011) features • n-grams • document statistics • punctuations • syntactic dependencies • the set of features computed for the immediately preceding post in its thread • Adding frame-semantic • framesemanticparse for each sentence • for each frame that a sentence contains, we create three types of frame-semantic features 21 Sen Han 9

Features • Frame-word interaction feature:(frame-word1-word2) • “Possession-right-woman; Possession-woman-choose”, unordered word pair • Frame-pair feature: (frame2:frame1) • “Choosing:Possession”, ordered 21 Sen Han 10

Frame-semantic features • Frame n-gram feature: • its frame name (if the word is a frame target) • its frame semantic role (if the word is present in a frame element). • “woman+has” • woman+Possession, People+has,People+Possession , Owner+Possessionand Owner+has. 21 Sen Han 11

Data • amount and quality of the training data • collect documents relevant to the debate domain from different sources • stancelabelthem heuristically • combination of noisily labeled documents with the stance-annotated debate posts 21 Sen Han 12

Data Roughly the same number of phrases were created for the two stances in a domain. 21 Sen Han 13

Constraints • Author constraints (Acs) • two posts written by the same author for the same debate domain should have the same stance • post-process the output of a stance classifier. • Probabilistic votes cast of posts • Majority voting for stance 21 Sen Han 14

Experiment and evaluation • 5-foldcross validation • accuracy is the percentage of test instances correctly classified • Three folds for model training, one fold for development, and one fold for testing in each fold experiment 21 Sen Han 15

Results • Results for three selected points on each learning curve, which correspond to the three major columns in each sub-table. 21 Sen Han 16

Results • ‘F’ • finegraind model • ‘W’ • only n-gram features . • ‘A’ • Anandet al.’s (2011) features • ‘A+FS’ • Anandet al.’s features and frame-semantic features. • The last two rows • noisily labeled documents and author constraints are added incrementally to A+FS. 21 Sen Han 17

Results • learning curves for HMM and HMMF for the four domains • the best-performing configuration is A+FS+N+AC, which is followed by A+FS+N and then A+FS 21 Sen Han 18

Discussion 21 Sen Han 19

Thanks 21 Sen Han 20

Unigram • List of words appearing in training data at least 10 times and is associated with document stance c at least 70% of times • A list of words Frequently appearing in training data, which is relevant to the stance of document • p(w)=#w/#(w in corpus) 21 Sen Han 21

Improving Stance Classification in Ideological Debates

Improving Stance Classification in Ideological Debates

Presentation Transcript

17 June 2019

June 2019 LIBOA

DCEC June 13 th , 2019 meeting #4

June 2019

June 18, 2019

June 2019