170 likes | 293 Views
A Random Walk Approach to Modeling the Dynamics of the Blogosphere. Alex X. Liu Dept. of Computer Science and Engineering Michigan State University Joint work with M. Zubair Shafiq. Background. Important niche of online social networks Blogosphere consists of two networks
E N D
A Random Walk Approach to Modeling theDynamics of the Blogosphere Alex X. Liu Dept. of Computer Science and Engineering Michigan State University Joint work with M. ZubairShafiq
Background • Important niche of online social networks • Blogosphere consists of two networks • Blog network (Nodes = Blogs, Edges = Hyperlinks) • Post network (Node = Posts, Edges = Hyperlinks) Blog network Post network Blogosphere
Motivation • Modeling the evolution dynamics of blogosphere • How do blogs produce posts? • What are underlying mechanisms? • Applications • Advertising • Forecasting • Studying effect of probing for improving platform design
Problem Statement • Model • Generative model of individual bloggers in the blogosphere • Replicated for all bloggers • Allowed to execute over a given period of time • Requirements • Only use local mechanisms • Intuitive and realistic • Evaluation • Ground truth: properties of real-world blogosphere • Temporal properties, e.g. inter-posting time • Topological properties, e.g. degree distribution
Limitations of Prior Art • Only Topological properties • Second space [Karandikar08ICWSM] • Ad hoc in nature • Multiple input parameters • Kronecker graphs [Leskovec07ICML] • Not specifically designed for blogosphere • Only Temporal properties • Randomized blogspace [Kumar03WWW) • Mostly focuses on burstiness property • Both Topological and temporal properties • Zero-crossing [Gotz09ICWSM] • No parameters, cannot control properties • Uses global properties, e.g. total number of in-links to a blog
Proposed Approach • Random walk process • Different variants of the random walk process • Emulate the topological and temporal characteristics of individual bloggers Two dimensional One dimensional
Flow Chart of the Proposed Model • Series of random walks for each blogger • Random walk 1 • Post at zero-crossing • Random walk 2 • Select new blogs to link (explore) • Random walk 3 • Select previously linked blogs to link (exploit) • Random walk 4 • Select post to link • Random walk 5 • Link to posts referred by the selected post
Proposed Model • Random walk 1 (one dimensional random walk) • Temporal dynamics of a blogger's posting behavior • Publish a new post at zero crossing • Reproduce burstiness, self-similarity in publishing behavior • Slope of entropy plot ≈ 0.7 < 1
Proposed Model • Random walk 2 (random walk on blog graph) • Select new blogs to link (Explore) • Starts at a randomly chosen node of the blog graph • Blog reached at the end of random walk is selected • Random walk 3 (random walk on blog graph) • Select previously linked blogs to link (Exploit) • Starts at the corresponding node of the blog graph • Blog reached at the end of random walk is selected
Proposed Model • Random walk 4 (random walk on post graph) • To select a post from the selected blog • Ordered in the reverse-chronological order • Initiates at the latest post • Post reached at the end of random walk is selected • Random walk 5 (random walk on post graph) • Blogger recursively refers to some out-links of the selected post (link expansion) • Starts at the post selected in random walk 4 • Post reached at the end of random walk is selected
Inter-posting times • Definition: time between two consecutive posts • The distribution of inter-posting times follows power-law • Implication: blogging activity is characterized by long periods of inactivity separated by short periods of activity
Blog in-degree • Definition: count of in-linking blogs • The distribution of blog in-degree times follows power-law • Implication: only a few blogs receive large number of in-links and a majority of blogs remain unnoticed
Post in-degree • Definition: count of in-linking posts • The distribution of post in-degree times follows power-law • Implication: only a few posts receive large number of in-links and a majority of posts remain unnoticed
PageRank • Definition: PageRank or Eigenvector centrality assign importance weight assigned to every node in a network • The distribution of PageRank times follows power-law • Implication: only afew posts are highly cited
Future Work • Study affect of varying the length of random walk on structural properties of generated blogospheres • Structural properties: • Transitivity • Average clustering coefficient • Average shortest path length • Size of largest connected component
Conclusions • Propose random walk based model to simulate the evolution of blogosphere • Intuitive and simple • Simultaneously considers temporal and topological properties • Model works with the local information, does not utilize global information • Experiments show that the properties of evolved blogosphere follow those of real-world blogosphere • Structural properties can be controlled by varying the length of random walk