1 / 27

Self- introduction

Self- introduction. Name: 鲍鹏 (Peng Bao) Research Interests: Popularity Prediction, Information Diffusion, Social Network , etc… Grade: In the third year pursuing for the PhD. Group: NASC(Network Analysis and Social Computing) Lab: Research Center of Web Data Science & Engineering

Download Presentation

Self- introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self-introduction • Name: • 鲍鹏(Peng Bao) • Research Interests: • Popularity Prediction, Information Diffusion, Social Network, etc… • Grade: • In the third year pursuing for the PhD. • Group: • NASC(Network Analysis and Social Computing) • Lab: • Research Center of Web Data Science & Engineering • Doctoral supervisor: • Prof. Xue-Qi Cheng

  2. Previous Work Authors: Peng Bao, Hua-Wei Shen, Junming Huang, Xue-Qi Cheng popularity prediction in Microblogging Network——An empirical study

  3. Outline • Background & Motivation • Problem definition • Related works • Preliminary study • Structural characteristics • Prediction & Results • Conclusions and Discussions

  4. Background • BURSTof SNS • Everyone is a member in the We the Media age! • SinaWeibo plays an more and more important social role. • Opportunities and Challenges • Special issues in Sci./Nat. • Computational social science [D. Lazer et al. Science 323, 721-724 (2009)]

  5. Challenging • An interesting and fundamental question • How to track, to understand, and to predict the information flow on the network? • To predict the long-term popularity of online content is very HARD! • Popularity is unequally distributed. • high interaction among users • intrinsic interestingness of content • external influence from traditional media • active period of users

  6. Motivation • Popularity prediction is USEFUL! • From technology view • Drive enterprises to design a cost-effective cache and content distribution mechanism system • From business view • Help journalists, content providers, advertisers, news recommend systems to provide information services and to design viral marketing strategy • From sociology view • Reveal the human collective behavior • Facilitate governors to supervise and to guide public opinion Increasing availability of data increase Predictability!

  7. Problem definition • Popularity prediction: Given a tweet and its forward information before an indicating time ti, We want to predict the popularity p(tr) at a reference time tr. • Indicating time ti : The time at which we observe the information of a tweet. • Reference time tr : The time at which we intend to predict the popularity of a tweet. • Popularity p(t) : The number of times that a tweet is re-tweeted at time t.

  8. Related works • Temporal correlation based [SzaBo et al. C ACM 2010] • Strong correlation between Early and later log popularity • Linear regression • Visibility and Interestingness based [Lermanet al. WWW 2010] • User behavior modeling • Estimate the interestingness

  9. Related works cont.’ • Matrix Factorization based[Cui et al. SIGIR 2011] • Estimate the latent factor of user and item • Feature based[Hong et al. WWW 2011] • Formalized to classification problem • Logistic regression • Temporal pattern based [Matsubaraet al. KDD 2012] • Periodical • Avoid infinity • Power-law decay Existing methods mainly focused on the quality of content, the interface of the social media site, the collective behavior of users. We focus on the structural characteristics of the networks spanned by early adopters

  10. Preliminary study • Popularity distribution The popularity of tweets roughly follows a power-law distribution, distributes very unequally.

  11. Preliminary study • Lifespan of tweets Most tweets receive 80% of the final popularity in 24 hours and 90% in 48 hours. The lifespan of tweets follows a log-normal distribution.

  12. Preliminary study • Active period “Wenzhou train collision” We should consider the variation in hourly activity cycles The daily variation has no obvious relationship with week cycle and are event-related.

  13. Temporal correlation of logarithmic popularity The correlation is weak with large deviation. The Pearson Correlation Coefficients is 0.74 It is less reliable to predict the popularity of a tweet if we just use its earlier popularity alone.

  14. Structural characteristics • We explore the network consisting of early adopters • Link density: the ratio of the number of existing follow- ship links and the number of all possible links. • Diffusion depth: the length of the longest path from the submitter to anyone of them.

  15. Structural characteristics • Empirical found The structural characteristics provide strong evidenceto help estimate the final popularity

  16. Prediction and Results Comparison approaches: Evaluation methods: Experiment results

  17. Conclusions • We empirically study structural characteristics, which can provide critical indicators • The prediction accuracy can be significantly improved by incorporating the factor of structural diversity • The conclusion capture the intuition • It provides us INSIGHTSto further study

  18. Discussions • Accumulative effect • Temporal characteristics • Event-based prediction

  19. On-Going Work accumulative Effect of Multiple Exposure in Information Diffusion on Social Network

  20. t3 t1 t2 Exposures and Adoptions • Exposures: Node’s neighbor exposes the node to the contagion • Adoption: The node acts (e.g. re-tweet) on the contagion rt? rt? rt? • Time: t1 < t2 < t3 < … < tn

  21. Problem definition • Exposure Curve: Probability of re-tweeting a tweet for a user depends on the number of friends who have already re-tweeted. • Dependence

  22. Example Application • Marketing agency: would like you to adopt/buy product X • They estimate the adoption curve • Should they expose you to X three times? • Or, is it better to expose you X, then Y and then X again?

  23. What we are doing • Classify the TWEETS by • Has URL or not • Has Event or not • Has Multiple Events or not • Deeper analysis on the ME for different event • Classify the USERS by • User’s degree • User’s active period • Local clustering coefficient

  24. What we are doing cont.’ • Structural diversity between the source of multiple exposures • Fix the number of exposure times, check • Link density • Number of connected components • Temporal effect • Temporal motif You will see the results soon!

  25. Closing Remarks This field is a piece of WILD but Fertile mineral land. We have done MANY! We knew A LITTLE. We should do MORE…

  26. Acknowledgement • Thank to all members in the NASC group (www.groupnasc.org) for helpful discussions and suggestions • Collaborators Xue-Qi Cheng, Hua-Wei Shen, Junming Huang

  27. Thanks! Q$A Email: baopeng@software.ict.ac.cn Weibo: http://weibo.com/sparkfield

More Related