1 / 21

Improving Stance Classification in Ideological Debates

This paper presents an improved approach for stance classification in online debate forums. It addresses the challenge of determining the stance expressed in two-sided debates and proposes enhancements in data, features, models, and constraints. Experimental results demonstrate the effectiveness of the proposed approach.

ronaldreed
Download Presentation

Improving Stance Classification in Ideological Debates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stance Classification of Ideological Debates Sen Han leonihsam@gmail.com 4. June 2019 1

  2. Outline • Abstract • Problem • Introduction • Previous approach • Improved approach (for this paper) • Improvements in stance classification • Models • Features • Data • Constraints • Experiments and Evaluation • Results • Discussion 21 Sen Han 2

  3. Problem • Determining the stance expressed in a post written for a two-sided debate in an online debate forum is a relatively new and challenging problem in opinion mining. • Improve the performance of a learning-based stance classification in different dimensions 21 Sen Han 3

  4. Previous • “Should homosexual marriage be legal ?” • The goal of debate stance classification is to determine which of the two sides (i.e., for and against) its author is taking • But • colorful and emotional language to express one’s points,which may involve sarcasm, insults, and questioning another debater’s assumptions and evidence. (spam,disturbance term) • Limited stance-annotated debate 21 Sen Han 4

  5. Improvement • Data: Increase the number of stance-annotated debate posts from different sources for training • Features: Addsemantic features on an n-gram-based stance classifier • Models:Exploitethe linear structure inherent in a post sequence, train a better model by learning only from the stance-related sentences without relying on sentences manually annotated with stance labels • Constraints: Extra-linguistic inter-post constraints, such as author constraints by postprocessing output of a stance classifier 21 Sen Han 5

  6. Models • Binary classifier • Naive Bayes (NB) • Support Vector Machines (SVMs) • Sequence labelers • first-order Hidden Markov Models (HMMs) • linear-chain Conditional Random Fields (CRFs) • Our model • unigram • fine-grained models. • stance label of a debate post and the stance label of each of its sentences 21 Sen Han 6

  7. Fine-grained model • Document • di • Adocument stance c with probability • P(c) • Sentence • em • Asentence stance s with probability • P(s|c) • N-thfeature representing em: • fn,with probability P(fn|s,c) • Sentence stance • P(s|em,di,c) 21 Sen Han 7

  8. Fine-griand Model • Classify each test post di using fine-grained NB • Maximum conditional probability • S_max • Set of sentences in test post di • S(di) • E.g • p(“for homosexual marriage”|d1)=80% • p(“for abortion”| d2)=5% 21 Sen Han 8

  9. Features • N-gram features • unigrams and bigrams collected from the training posts • Anand et al.’s (2011) features • n-grams • document statistics • punctuations • syntactic dependencies • the set of features computed for the immediately preceding post in its thread • Adding frame-semantic • framesemanticparse for each sentence • for each frame that a sentence contains, we create three types of frame-semantic features 21 Sen Han 9

  10. Features • Frame-word interaction feature:(frame-word1-word2) • “Possession-right-woman; Possession-woman-choose”, unordered word pair • Frame-pair feature: (frame2:frame1) • “Choosing:Possession”, ordered 21 Sen Han 10

  11. Frame-semantic features • Frame n-gram feature: • its frame name (if the word is a frame target) • its frame semantic role (if the word is present in a frame element). • “woman+has” • woman+Possession, People+has,People+Possession , Owner+Possessionand Owner+has. 21 Sen Han 11

  12. Data • amount and quality of the training data • collect documents relevant to the debate domain from different sources • stancelabelthem heuristically • combination of noisily labeled documents with the stance-annotated debate posts 21 Sen Han 12

  13. Data Roughly the same number of phrases were created for the two stances in a domain. 21 Sen Han 13

  14. Constraints • Author constraints (Acs) • two posts written by the same author for the same debate domain should have the same stance • post-process the output of a stance classifier. • Probabilistic votes cast of posts • Majority voting for stance 21 Sen Han 14

  15. Experiment and evaluation • 5-foldcross validation • accuracy is the percentage of test instances correctly classified • Three folds for model training, one fold for development, and one fold for testing in each fold experiment 21 Sen Han 15

  16. Results • Results for three selected points on each learning curve, which correspond to the three major columns in each sub-table. 21 Sen Han 16

  17. Results • ‘F’ • finegraind model • ‘W’ • only n-gram features . • ‘A’ • Anandet al.’s (2011) features • ‘A+FS’ • Anandet al.’s features and frame-semantic features. • The last two rows • noisily labeled documents and author constraints are added incrementally to A+FS. 21 Sen Han 17

  18. Results • learning curves for HMM and HMMF for the four domains • the best-performing configuration is A+FS+N+AC, which is followed by A+FS+N and then A+FS 21 Sen Han 18

  19. Discussion 21 Sen Han 19

  20. Thanks 21 Sen Han 20

  21. Unigram • List of words appearing in training data at least 10 times and is associated with document stance c at least 70% of times • A list of words Frequently appearing in training data, which is relevant to the stance of document • p(w)=#w/#(w in corpus) 21 Sen Han 21

More Related