520 likes | 632 Views
Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012. Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media *. Indrajit Bhattacharya Research Scientist IBM Research, Bangalore.
E N D
Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012 Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research, Bangalore *Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya
Social Media Analysis: Motivation • Microblogs: Twitter, Facebook, MySpace • Understanding and analyzing topics & trends • Influences on users • Variety of stakeholders • Business • Government • Social scientists
Social Media Analysis: Challenges • Network and Influences on Users • User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11] • Dynamic nature • Topics & user personalities evolve over time • Volume of data • Existing approaches fall short
Soc Med Analysis: State of the Art • Content Analysis • Ramage ICWSM 2010, Hong SOMA 2010 • Variants of LDA • Inferring User Interests • Ahmed KDD 2011, Wen KDD 2010 • Individual features such as user activity or network • Patterns in Temporal Evolution • Yang et al WSDM 2011
Bayesian Non-parametric Models • Choosing no of components in a mixture model • Particularly severe problem for large data volumes such as for social media data • Bayesian solution • Infinite dimensional prior • Allows no of mixture components to grow with data size • Cannot capture richness of social media data • Algorithms often not scalable
Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Multi-threaded Online Inference Algorithm • Experimental Results
Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Multi-threaded Online Inference Algorithm • Experimental Results
Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Parallelized Online Inference Algorithm • Experimental Results
Influence of Geography China India UK
Aggregating Influences • RelCRP is exchangeable like the CRP • Useful as a prior for infinite mixture model • RelCRP captures influence of one relation on posts • Influences act simultaneously on any user • Aggregated influence pattern is user specific • Different users affected differently by same combination of world-wide and geographic factors
Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Multi-threaded Online Inference Algorithm • Experimental Results
Evolving Patterns in Social Media • Number of Topics • Topics die and new ones are born • User Personalities • Susceptibility to influence by world-wide, geographic and friends’ preferences • Existing Topic Distributions • Words go out of fashion, new ones enter vocabulary • Topic Characters: • Popularity of topic changes world-wide, in users preference, sub-networks and geographies
Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Multi-threaded Online Inference Algorithm • Experimental Results
Online Algorithm • Traditional iterative framework does not scale for social media data • Sequential Monte Carlo methods [CaniniAIStats ‘09] that rejuvenate some old labels also infeasible • Online sampling[Banerjee SDM ‘07] does not revisit old labels at all; initial batch phase • Adapt for non-parametric setting
Multi-threaded Implementation • Sequential online implementation does not scale • Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10] • Our algorithm is parallel, online and non-parametric • Explicit consolidation by master thread at the end of each iteration • Only new topics consolidated
Talk Outline • Background: Chinese Restaurant Processes • CRP with multiple relationships: (RelCRP, MRelCRP) • Dynamic MRelCRP • Multi-threaded Online Inference Algorithm • Experimental Results
Datasets and Baselines • Twitter: 360 million tweets (Jun-Dec 2009) • Facebook: 300,000 posts (public profiles, 3 mths) • Latent Dirichlet Allocation (LDA) • [Hong SOMA 2010] • Labeled LDA (L-LDA) • Hashtags as topics [Ramage ICWSM 2010] • Timeline • Dynamic non-parametric topic model [Ahmed UAI 2010]
1 Model Goodness • Perplexity: Ability to generalize to unseen data • Both network and dynamics are important for modeling social media data
2 Quality of Discovered Topics • Label assigned to each post indicating category • Distribution over words indicating semantics • Clustering posts using topic labels • Prediction using topic labels • Predicting post authorship & user commenting activity • Major event detection
2A Post Clustering using Topics • Use hashtags as gold standard (for Twitter) • 16K posts #NIPS2009, #ICML2009, #bollywoodetc • DMRelCRP close to L-LDA without using hashtags • DMelCRP produces ‘finer-grained’ clusters
2B Prediction Using Topics • Authorship: Given post and user, predict if author • Commenting activity: Given post and (non-author) user, predict if user comments on that post • DMRelCRP topics lead to more accurate prediction
3A Global Personality Trends FIFA WC Michael Jackson’s death Google Wave
3B Geo-specific Personality Trends • Personality trends very similar in UK and US • Geographic influences high at different epochs
3B Geo-specific Personality Trends • India: W-wide and geographic influences weaker • China: W-wide weak, geo strong; stable pattern