160 likes | 173 Views
This research aims to analyze the properties of conversations in online social networks and design a generative model to produce these properties. The study includes the analysis of threads in Yahoo! Groups, Usenet, and Twitter, and proposes a baseline model and an improved time-based model to generate thread structures and determine authorship.
E N D
Dynamics of Conversations Ravi Kumar, Y! Research Mohammad Mahdian, Y! Research Mary McGlohon, CMU
Motivation • Online social networks are a major source people turn to for information. • Our goal: To understand the dynamics of information dissemination in social networks. • What are properties of conversations (threads)? • Can we design a generative model to produce these properties?
Our methods: Analyzing threads • Analyze threads in online groups: Yahoo! Groups, Usenet, and Twitter. • Take each thread, represent as a subgraph. • Perform network measures on the set of subgraphs. KDD 2011 in Fiji? FakeKDD-mailing-list Subject: KDD11 venue Author Message Alice “KDD2011 in Fiji?” Bob “Great idea!” Cal “Too much sunshine” Alice “OK,…” Too much sunshine Great idea! OK, you can stay home.
Data • Yahoo! Groups • Public, moderated, active groups • 13,000 groups, 14.9 million messages • Usenet • 100 high-activity groups for 1 year • 22 million messages • Twitter • Sampled for one month • 69 million messages
Observation: Size vs Depth • Q: If more replies occur in the thread, where do they join? • A: The depth of threads grows sub-linearly, but super-logarithmically in the size of the thread.
Observation: Degree of threads • Q: Does the number of responses a message gets depend on its depth? • A: The degree distribution does change according to level in the thread.
Observation: Authorship • Q: As a thread increases in size, how many new authors join? • A: There is a power-law relationship between: • Size of thread and maximum activity from one author. • Size of thread and number of authors participating.
Baseline model: Branching Process • Each node has some k children with probability distribution p. • Pros • Conceptually simple • Cons • Not generative • Will not produce behavior by levels • Will not have heavy-tail depth distribution • Does not take into account recency effects log(p(k)) log(k)
Time model • Thread grows in discrete time steps • At each time tick, may stop thread • Or add message in reply to some node • New attachment point is chosen based on degree and recency of parent node v • p(child, v) = a dv+ trv , a ≥ 0, 0 ≤ t ≤ 1 • Pros • Will have both preferential attachment and recency effects t=1, r=0 d=0
Time model • Thread grows in discrete time steps • At each time tick, may stop thread • Or add message in reply to some node • New attachment point is chosen based on degree and recency of parent node v • p(child, v) = a dv+ trv , a ≥ 0, 0 ≤ t ≤ 1 • Pros • Will have both preferential attachment and recency effects t=1, r=1 d=1 t=2 r=0 d=0
Time model • Thread grows in discrete time steps • At each time tick, may stop thread • Or add message in reply to some node • New attachment point is chosen based on degree and recency of parent node v • p(child, v) = a dv+ trv , a ≥ 0, 0 ≤ t ≤ 1 • Pros • Will have both preferential attachment and recency effects t=1, r=2 d=2 t=3 r=0 d=0 t=2 r=1 d=0
Time model • Thread grows in discrete time steps • At each time tick, may stop thread • Or add message in reply to some node • New attachment point is chosen based on degree and recency of parent node v • p(child, v) = a dv+ trv , a ≥ 0, 0 ≤ t ≤ 1 • Pros • Will have both preferential attachment and recency effects • Using only one will produce either “busy” or “skinny” trees (stars/chains) t=1, r=2 d=2 t=3 r=0 d=0 t=2 r=1 d=0
Time model with identity • After modeling time model, assign identities to nodes • Use Polya urn-like process • Either pick new author, or pick author from further up in the chain (except parent)
Model: Size vs. depth • Simulation: • Data:
Model: Degree, author activity • Simulation: • Data:
Conclusion • We examined several properties of conversations in 3 large sets of data • Showed that the thread structure and degree of a node are inter-related • We proposed models to generate these properties • Baseline birth process model • An improved model, depending on time and determining authorship