300 likes | 310 Views
This paper explores the challenges and solutions for reliable hypothesis validation in social sensing applications, including hypothesis-claim matching, hypothesis validation, and truth discovery techniques. It presents an optimal hypothesis validation scheme that combines topic identification, hypothesis-claim matching, and claim truthfulness estimation. The proposed approach aims to provide reliable information for decision support in social sensing applications.
E N D
Towards Reliable Hypothesis Validation in Social Sensing Applications Dong Wang, Daniel Zhang, Chao Huang Department of Computer Science and Engineering University of Notre Dame SECON 18, Hong Kong, China
Sensing is Evolving Platform Sensors are increasingly used by everyday people Smart Phone
Sensing is Evolving Platform Sensors are increasingly used by everyday people Smart Phone Social (Human-Centric) Sensing is Emerging! Application Human are getting into the Loop of Sensing Health Monitoring Geotagging Target Tracking Environment Monitoring Social Sensing Smart House
Social Sensing A set of applications where data are collected from human sources or devices on their behalf. Human + Cyber + Physical Twitter Mood Predicts Stock Market, 2011 Help Pilgrims utilize schedule in Hajj , 2012 An Emerging Paradigm of Cyber-Physical Systems with Human-in-the-loop FourSquare helps blind people navigate , 2012 Japan Tsunami and Nuclear Event, 2011
Why Social Sensing?A Confluence of Three Trends Mass Dissemination Media Sensors Connectivity Smart Phone Cars on Internet Smart Meter GPS Cell-phones
Truth Discovery in Social Sensing What to believe? Who to believe? Text Reliable Information for Decision Support! People Numeric data Smart Devices Images Sources Measurements (Claims)
Related Work Dynamic and Scalable Model 5 ICDCS 17 ICDCS 13 Truth Discovery IPSN 12 Recursive Model 2 IPSN 14, 16 SECON 18 Basic Model 1 Reliable Hypothesis Validation 6 Source Dependency 3,4 1. Dong Wang, Lance Kaplan, Hieu Le, and TarekAbdelzaher. "On Truth Discovery in Social Sensing: A Maximum Likelihood Estimation Approach." IPSN 12, Beijing, China April 2012. 2. Dong Wang, TarekAbdelzaher, Lance Kaplan and Charu C. Aggarwal. "Recursive Fact-finding: A Streaming Approach to Truth Estimation in Crowdsourcing Applications.", ICDCS 13, Philadelphia, PA, July 2013. 3. Dong Wang, TarekAbdelzaher and Lance Kaplan. "Humans as Sensors: An Estimation Theoretic Perspective.” IPSN 14, Berlin, Germany, April, 2014. 4. Chao Huang, Dong Wang. "Topic-Aware Social Sensing with Arbitrary Source Dependency Graphs," IPSN 16, Vienna, Austria, April, 2016 5. Daniel Zhang, Chao Zhang, Dong Wang, Doug Thain, Xin Mu, Greg Madey and Chao Huang. "Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework," ICDCS17, Atlanta, GA, USA 6. Dong Wang, Daniel Zhang, Chao Huang*. "Towards Reliable Hypothesis Validation in Social Sensing Applications", SECON'18, Hong Kong, June, 2018.
Technical Challenges • Challenge 1: Hypothesis-Claim Matching • How to match the high-level hypotheses generated by end users to the relevant low-level claims generated by social sensors? • Challenge 2: Hypothesis Validation • How to reliably validate the truthfulness of the hypotheses from the estimated truthfulness of the claims?
Basic Definitions • Sources: • Claims: • Hypotheses: • Claim Truthfulness Vector: • Hypothesis Truthfulness Vector:
Basic Definitions Source Claim Matrix: SC (M by N) • M: Number of sources; N: Number of claims. N • Source Si reports claim Cj • Source Si does not report claim Cj M
Basic Definitions Claim Hypothesis Matrix: CH (N by K) • N: Number of Claims; K: Number of Hypothesis K • Degree of correlation bertween claim Cj and hypothesis Hk N
Our Goal Output: Hypothesis Truthfulness Estimated Claim Truthfulness
Solution: Reliable Hypothesis Validation (RHV) 3. Optimal Hypothesis Validation 1. Topic Identification from Claims 2. Hypothesis Claim Matching
RHV: Topic Identification from Claims • Objective: • Identify important topics that provide clues to help end users generate relevant hypotheses • Approach: • Topic Modeling and Gibbs Sampling Algorithm • Output: • T topics associated with a list of words that are strongly correlated with each topic Claim distribution over topics Topic distribution over words
RHV: Hypothesis Claim Matching • Objective: • Match the hypothesis from end users to the most relevant claims that can be used to validate its correctness • Approach: • Compute the similarity between hypothesis • Sematic Similarity (words) • Syntactic Similarity (order of words) • Overall Claim Hypothesis Similarity
RHV: Hypothesis Claim Matching • Maximize the relevance between claims and hypothesis • Approach: • Critical Claim Selection: • Solve the multi-objective optimization problem using linear combination: • Minimize the dependency between claims • Multi-objective optimization with constraints
RHV: Optimal Hypothesis Validation • Objective: • Validate the truthfulness of hypotheses from the estimated truthfulness of the identified critical and relevant claims • Approach: • Claim Truthfulness Estimation • Truth Discovery Solutions • Optimal Hypothesis Validation • Reliable Hypothesis Validation Scheme
An Example of Truth Discovery Solutions:Expectation Maximization Expectation Maximization Z={z1, z2, …zN}: Correctness Sensing Observations Estimation parameter Observed data Hidden Variable X Apply EM Expectation Step (E-step) Maximization Step (M-step) Find MLE of estimation parameter and values of hidden variables
RHV: Optimal Hypothesis Validation • Optimization Formulation • Approach:Weighted Mean Algorithm CHMatrix
Evaluation: A Real World Application Unreliable and Noisy Tweets Unreliable and Hypothesis-ignorant Users Oregon Shooting, Oct. 2015 Baltimore Riots, April, 2015 Paris Charlie Hebdo Attack, Nov. 2015
Evaluation: Data Collection http://apollo.cse.nd.edu/index.html RHV is integrated as an option for data analysis Keywords/Location
Evaluation: Real-World Application Data Trace Statistics: • Hypothesis Set Generation: • 5 independent individuals serve as end users • Each individual generated 30 hypotheses for each dataset • Clean up the hypothesis set by removing redundant and non conclusive ones • Manually collect ground truth labels for evaluation purpose
Evaluation: Performance Comparison (1/3) Paris Attack Data Trace (2015) Similar results are observed in other two datasets
Evaluation: Performance Comparison (3/3) Our Approach Execution Time Comparison RHV is among the fastest in the compared schemes across different datasets
Future Work • Explore more comprehensive claim and hypothesis matching approaches • Consider a hierarchical structure from claims to hypothesis • Explore logical relationship between hypotheses • Validate the developed models to applications beyond Twitter
Conclusion • This paper formulates a new hypothesis validation problem in social sensing • A reliable hypothesis validation (RHV) framework to address two technical challenges (claim-hypothesis matching and hypothesis correctness validation) • Evaluation using real world social sensing data collected from Twitter feeds
Thank You! Social Sensing Lab University of Notre Dame http://www3.nd.edu/~sslab/ dwang5@nd.edu