Rumor Source Detection: A Group Testing Approach

Rumor Source Detection: A Group Testing Approach Department of Computer Science University of Texas at Dallas Ding-Zhu Du

OUTLINE • Background • Rumor Source Detection Problem • Group Testing Approach

Social Networks • > 1.3 billionusers • The 2nd largest “Country” in the world • More visitors than Google • > 800 millionusers • 2013, 400 millionusers, 40% yearly increase • 2009, 2 billiontweets per quarter • 2010, 4 billiontweets per quarter • 2011, 25 billion tweets per quarter • More than 6billion images • Pinterest, with a traffic higher than Twitter and Google

A Trillion Dollar Opportunity Online Social networks have become a bridge to connectour daily physical life and the virtual web space On2Off Commerce[1] [1] Online to Offline is trillion dollar business http://techcrunch.com/2010/08/07/why-online2offline-commerce-is-a-trillion-dollar-opportunity/

Influence Propagation I hate Obama, the worst president ever I love Obama Positive Negative

Social influence occurs when one's opinions, emotions, or behaviors are affected by others, intentionally or unintentionally.[1] What is Social Influence? [1] http://en.wikipedia.org/wiki/Social_influence

Three Degree of Influence Six degree of separation[1] Three degree of Influence[2] • Each person may influence 150 persons [3] • In total, you are able to influence about 1,000,000 (1503) persons in the world! [1] S. Milgram. The Small World Problem. Psychology Today, 1967, Vol. 2, 60–67 [2] J.H. Fowler and N.A. Christakis. The Dynamic Spread of Happiness in a Large Social Network: Longitudinal Analysis Over 20 Years in the Framingham Heart Study. British Medical Journal 2008; 337: a2338 [3] R. Dunbar. Neocortex size as a constraint on group size in primates. Human Evolution, 1992, 20: 469–493.

When misinformation or rumor spreads in social networks, what will happen?

A misinformation said that the president of Syria is dead, and it hit the twitter greatly and was circulated fast among the population, leading to a sharp, quick increase in the price of oil. http://news.yahoo.com/blogs/technology-blog/twitter-rumor-leads-sharp-increase-price-oil-173027289.html

In August, 2012, thousands of people in Ghazni province left their houses in the middle of the night in panic after the rumor of earthquake. http://www.pajhwok.com/en/2012/08/20/quake-rumour-sends-thousands-ghazni-streets

Control the spread of rumors

Motivation • Rumors spread through the network • We only see who received rumor but not where they got rumor from • Can we locate the hidden rumor sources?

Problem Description • Given • Social network structure • Infection time of monitors • Goal • Select a subset of vertices with minimum cardinality such that the rumor source can be uniquely located. • Question • Which set of vertices should we select? • Applications • Epidemiology: Virus • Social Media: Rumor

Related Work • Shah and Zaman, 2010, 2011, 2012: • “Rumor Centrality”-single source, Susceptible-Infected (SI) model • Luo and Tay, 2012: • Multiple sources, Susceptible-Infected-Recovered (SIR) model • Zhu and Ying, 2013: • Single source estimation for SIR model • Seo et al., 2012; Karamchandani and Franceschetti, 2013; Luo and Tay, 2013; Zhu and Ying, 2014: • Partial observations

What is Group Testing? Part I

An Example There are 9 items with 1 defective. Please identify the defective item with the minimum number of tests. Each test on a subset of items can tell whether the subset contains the defective item or not.

3 tests are enough? 1 2 3 4 5 6 7 8 9 1 2 3 4 5 4 5

No, sometime you need 4 tests. 1 2 3 4 5 6 7 8 9 1 2 3 4 5 1 2 3 1 2

Minimum # of Tests Goal: minimize # of tests in the worst case. So, for bisecting on 9 items with 1 defective, the # of tests in the worst case is 4. For bisecting on n items with 1 defective, require tests

This is the best you can do! There are totally n outcomes. Each tests has two outcomes, which divides a subset of outcomes into two smaller subsets. Suppose k tests are enough. Then

Two Defective Items Need at least tests. For some n, the lower bound can be reached. But, for some n, the lower bound can not be reached. This case is complicated.

3 or more defective items Very hard to find the optimal algorithm for determine tests!!!

Classical Group Testing Given n items with some positive ones, identify all positive ones by less number of tests. Each test is on a subset of items. Test outcome is positive iff there is a positive item in the subset.

Idea of Group Testing (GT) _ _ _ _ _ _ _ _ _ _ _ + _ _ _ _ _ + positive negative

History During World War II, to test syphiliticantigen, David Rosenblatt first proposed the idea. In 1943, Robert Dorfman published the first paper on Group Testing. After 1943, there are many papers published and many applications discovered.

Group Testing • Sequential Group Testing • Nonadaptive Group Testing • Pooling Design (Biology)

Example 1 - Sequential 1 2 3 4 5 6 7 8 9 1 2 3 4 5 4 5

Example 2 – Non-adaptive p1 1 2 3 p2 4 5 6 p3 7 8 9 p4 p5 p6 O( ) tests for n items

Sequential and Non-adaptive Sequential GT needs less number of tests, but longer time. Non-adaptive GT needs more tests, but shorter time. In molecular biology, non-adaptive GT is usually taken. Why?

Los Alamos Labs in 1998 Face 220,000 clones to do screening. If test individually, need 220,000 tests. Actually, use 376 tests. What is the technique?

Methodology for Rumor Source Detection • Definition (Set Resolving Set (SRS)). Node set K ⊆ V is an SRSif any different detectable node sets A,B ∈ V are distinguishable byK. • Two node sets A,B ⊆ V are distinguishable by K if there exist two nodes x, y ∈ K such that • : the time that node x received the rumor from A,

Influence Propagation Model • Rumor propagates from the sources to any vertex through shortest paths in the network. • As soon as a vertex receives the information, it sends the information to all its neighbors simultaneously, which takes one time unit. • Thus, the time that a rumor initiated at node u is received by node v is ru(v) = s(u) + d(u, v).

An Example of Set Resolving Set (SRS) {A,B,C} is a SRS. A B E F C D

An Example of Set Resolving Set (SRS) {A,C} is not a SRS. A B E F C D

Active Source & Inactive Source • Only active source is detectable.

TheoremFor any graph, there exists a SRS

Problem Definition • MULTI-RUMOR-SOURCE DETECTION problem (MRSD): find a SRS K with the smallest cardinality.

Greedy Algorithm & Its Approximation Ratio • Theorem . Algorithm1 correctly computes a SRS with provable approximation ratio of at most (1 + r ln n + ln log2 ). • r : upper bound for the number of sources • : maximum number of equivalence classes divided by one node-pair.

Thank you very much!

Rumor Source Detection: A Group Testing Approach