An Attempt at Group Belief Characterization and Detection

An Attempt at Group Belief Characterization and Detection Danny Dunlavy Computer Science and Informatics Department (1415) Sandia National Laboratories Nick Pattengale, Travis Bauer July 23, 2008 SAND2008-5426P

Disclaimers • We do not think our problem is well formed • We are not sure whether our approach is sound • We are not confident an answer is in our data

Problem Description • Given • Set of beliefs / statements • Set of groups • Beliefs held by groups • Documents associated with groups • Tasks • General: Detect / track / predict beliefs and /or changes • Specific 1: Detect change in belief at a given point in time • Dates: July 2005-July 2006; split date: January 2006 • Data marked as “Before” and “After” • Specific 2: Differentiate between groups by belief

Could have been Jenny Holzerisms Beliefs • Exceptional people deserve special concessions • Potential counts for nothing until it's realized • Reticence and secrecy are excellent pasttimes • People won't behave if they have nothing to lose • Fake or real indifference is a powerful weapon • Guilt and self-laceration are indulgences • Myth can make reality more intelligible • To disagree presupposes moral integrity • It is heroic to try to stop time • It can be helpful to keep going no matter what • Hamas is a terrorist organization • Hamas should disarm • Hamas should take part in government • Hamas should take part in PNA elections • Israel is a state • Israel should be destroyed • Israel should occupy Palestine • Oslo Accords is a peace solution • Political law is Islamic law • There exists a two state solution

Groups • Fatah (F) • Islamic Jihad (IJ) • Israel (I) • Military Wing (MW) • Muslim Brotherhood (MB) • Palistinian Authority (PA) • Political Bureau (PB) • Quds Brigades (QB) • Syria (S) • United States (US)

Beliefs Held by Groups Belief Group

Documents

Solution Approach • Split data into two groups • Before (training) / After (testing) • Create a weighted vector space model • STANLEY • Term space defined by “Before” split • Create binary classifier models • Scenario 1: Model each group per belief • Scenario 2: Model all groups per belief • Apply classifier models • Apply models for a group to that group’s documents • Do test documents align with the same beliefs in general? • Apply model for all groups to each group’s documents • Can we align beliefs and/or groups to specific documents?

Identified Challenges / Issues / Problems • Beliefs used as labels only • Semantics/meaning of beliefs notused in analysis • Beliefs labeled by subject matter experts based on understanding of groups and beliefs • Data not considered in labeling process • Groups are labeled by beliefs, not data • Documents labeled by group • Groups labeled by beliefs • Data collected using keyword search related to groups only • Beliefs not taken into account • Data is about groups, not authored by groups • Data not labeled for validation of problem we are solving • Detected changes cannot be validated • Method evaluation is difficult

Binary Classifier Methods • Random Forest (D. Dunlavy) • Ensemble of decision tree base classifiers (200) • Data sampling with replacement to train each base classifier (10%) • Feature sampling at each node split in the trees (100) • Information gain (entropy) used to determine feature and split used • Kernel Perceptron (T. Bauer [analysis], J. Basilico [code]) • Classification function: • Linear kernel: • Polynomial kernel: • Radial Basis kernel:

Evaluation • Labeling statistics • Positive: has a belief; negative: does not have belief • TP: true positives (labeled +, predicted +) • TN: true negatives (labeled -, predicted -) • FP: false positives (labeled -, predicted +) • FN: false negatives (labeled +, predicted -) • Performance Measures • Accuracy: • Precision: • Recall:

Training Results

Testing Data

Polynomial Kernel Perceptron Percentage Correct: 68.00% Accuracy: Green indicates that the model chose the belief that the SME chose. Red indicates that the software chose differently.

Random Forest Percentage Correct: 72.00% Accuracy: Green indicates that the model chose the belief that the SME chose. Red indicates that the software chose differently.

General Thoughts / Questions • What features are important / available? • We used terms • Problems: negation, lack of context, intent • Audience, purpose, goal, context of document • Would you say something different if different people were here? • Are we modeling groups or individuals? • Outliers, subgroup detection • Who/what is the source of data/documents? • Group members versus outsiders (reporters, etc.) • Level of intimacy with or knowledge of group • Can we incorporate / model perspective into analysis? • Can we identify / define an ideology? • Do we need to in order to model changes in ideology? • Is there a topology of ideologies? • Are relationships between ideologies important?

Thank You An Attempt at Group Belief Characterization and Detection Danny Dunlavy dmdunla@sandia.gov http://www.cs.sandia.gov/~dmdunla

An Attempt at Group Belief Characterization and Detection