Towards Reliable Hypothesis Validation in Social Sensing Applications

Towards Reliable Hypothesis Validation in Social Sensing Applications Dong Wang, Daniel Zhang, Chao Huang Department of Computer Science and Engineering University of Notre Dame SECON 18, Hong Kong, China

Sensing is Evolving

Sensing is Evolving Platform Sensors are increasingly used by everyday people Smart Phone

Sensing is Evolving Platform Sensors are increasingly used by everyday people Smart Phone Social (Human-Centric) Sensing is Emerging! Application Human are getting into the Loop of Sensing Health Monitoring Geotagging Target Tracking Environment Monitoring Social Sensing Smart House

Social Sensing A set of applications where data are collected from human sources or devices on their behalf. Human + Cyber + Physical Twitter Mood Predicts Stock Market, 2011 Help Pilgrims utilize schedule in Hajj , 2012 An Emerging Paradigm of Cyber-Physical Systems with Human-in-the-loop FourSquare helps blind people navigate , 2012 Japan Tsunami and Nuclear Event, 2011

Why Social Sensing?A Confluence of Three Trends Mass Dissemination Media Sensors Connectivity Smart Phone Cars on Internet Smart Meter GPS Cell-phones

Truth Discovery in Social Sensing What to believe? Who to believe? Text Reliable Information for Decision Support! People Numeric data Smart Devices Images Sources Measurements (Claims)

Our Problem: Reliable Hypothesis Validation

Related Work Dynamic and Scalable Model 5 ICDCS 17 ICDCS 13 Truth Discovery IPSN 12 Recursive Model 2 IPSN 14, 16 SECON 18 Basic Model 1 Reliable Hypothesis Validation 6 Source Dependency 3,4 1. Dong Wang, Lance Kaplan, Hieu Le, and TarekAbdelzaher. "On Truth Discovery in Social Sensing: A Maximum Likelihood Estimation Approach." IPSN 12, Beijing, China April 2012. 2. Dong Wang, TarekAbdelzaher, Lance Kaplan and Charu C. Aggarwal. "Recursive Fact-finding: A Streaming Approach to Truth Estimation in Crowdsourcing Applications.", ICDCS 13, Philadelphia, PA, July 2013. 3. Dong Wang, TarekAbdelzaher and Lance Kaplan. "Humans as Sensors: An Estimation Theoretic Perspective.” IPSN 14, Berlin, Germany, April, 2014. 4. Chao Huang, Dong Wang. "Topic-Aware Social Sensing with Arbitrary Source Dependency Graphs," IPSN 16, Vienna, Austria, April, 2016 5. Daniel Zhang, Chao Zhang, Dong Wang, Doug Thain, Xin Mu, Greg Madey and Chao Huang. "Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework," ICDCS17, Atlanta, GA, USA 6. Dong Wang, Daniel Zhang, Chao Huang*. "Towards Reliable Hypothesis Validation in Social Sensing Applications", SECON'18, Hong Kong, June, 2018.

Technical Challenges • Challenge 1: Hypothesis-Claim Matching • How to match the high-level hypotheses generated by end users to the relevant low-level claims generated by social sensors? • Challenge 2: Hypothesis Validation • How to reliably validate the truthfulness of the hypotheses from the estimated truthfulness of the claims?

Basic Definitions • Sources: • Claims: • Hypotheses: • Claim Truthfulness Vector: • Hypothesis Truthfulness Vector:

Basic Definitions Source Claim Matrix: SC (M by N) • M: Number of sources; N: Number of claims. N • Source Si reports claim Cj • Source Si does not report claim Cj M

Basic Definitions Claim Hypothesis Matrix: CH (N by K) • N: Number of Claims; K: Number of Hypothesis K • Degree of correlation bertween claim Cj and hypothesis Hk N

Our Goal Output: Hypothesis Truthfulness Estimated Claim Truthfulness

Solution: Reliable Hypothesis Validation (RHV) 3. Optimal Hypothesis Validation 1. Topic Identification from Claims 2. Hypothesis Claim Matching

RHV: Topic Identification from Claims • Objective: • Identify important topics that provide clues to help end users generate relevant hypotheses • Approach: • Topic Modeling and Gibbs Sampling Algorithm • Output: • T topics associated with a list of words that are strongly correlated with each topic Claim distribution over topics Topic distribution over words

RHV: Hypothesis Claim Matching • Objective: • Match the hypothesis from end users to the most relevant claims that can be used to validate its correctness • Approach: • Compute the similarity between hypothesis • Sematic Similarity (words) • Syntactic Similarity (order of words) • Overall Claim Hypothesis Similarity

RHV: Hypothesis Claim Matching • Maximize the relevance between claims and hypothesis • Approach: • Critical Claim Selection: • Solve the multi-objective optimization problem using linear combination: • Minimize the dependency between claims • Multi-objective optimization with constraints

RHV: Optimal Hypothesis Validation • Objective: • Validate the truthfulness of hypotheses from the estimated truthfulness of the identified critical and relevant claims • Approach: • Claim Truthfulness Estimation • Truth Discovery Solutions • Optimal Hypothesis Validation • Reliable Hypothesis Validation Scheme

An Example of Truth Discovery Solutions:Expectation Maximization Expectation Maximization Z={z1, z2, …zN}: Correctness Sensing Observations Estimation parameter Observed data Hidden Variable X Apply EM Expectation Step (E-step) Maximization Step (M-step) Find MLE of estimation parameter and values of hidden variables

RHV: Optimal Hypothesis Validation • Optimization Formulation • Approach:Weighted Mean Algorithm CHMatrix

Evaluation: A Real World Application Unreliable and Noisy Tweets Unreliable and Hypothesis-ignorant Users Oregon Shooting, Oct. 2015 Baltimore Riots, April, 2015 Paris Charlie Hebdo Attack, Nov. 2015

Evaluation: Data Collection http://apollo.cse.nd.edu/index.html RHV is integrated as an option for data analysis Keywords/Location

Evaluation: Real-World Application Data Trace Statistics: • Hypothesis Set Generation: • 5 independent individuals serve as end users • Each individual generated 30 hypotheses for each dataset • Clean up the hypothesis set by removing redundant and non conclusive ones • Manually collect ground truth labels for evaluation purpose

Evaluation: Performance Comparison (1/3) Paris Attack Data Trace (2015) Similar results are observed in other two datasets

Evaluation: Performance Comparison (2/3)

Evaluation: Performance Comparison (3/3) Our Approach Execution Time Comparison RHV is among the fastest in the compared schemes across different datasets

Future Work • Explore more comprehensive claim and hypothesis matching approaches • Consider a hierarchical structure from claims to hypothesis • Explore logical relationship between hypotheses • Validate the developed models to applications beyond Twitter

Conclusion • This paper formulates a new hypothesis validation problem in social sensing • A reliable hypothesis validation (RHV) framework to address two technical challenges (claim-hypothesis matching and hypothesis correctness validation) • Evaluation using real world social sensing data collected from Twitter feeds

Thank You! Social Sensing Lab University of Notre Dame http://www3.nd.edu/~sslab/ dwang5@nd.edu

Towards Reliable Hypothesis Validation in Social Sensing Applications