Predictively Modeling Social Text

Predictively Modeling Social Text William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon University Joint work with: Amr Ahmed, Andrew Arnold, Ramnath Balasubramanyan, Frank Lin, Matt Hurst (MSFT), Ramesh Nallapati, Noah Smith, Eric Xing, Tae Yano

Formal Primary purpose: Inform “typical reader” about recent events Broad audience: Explicitly establish shared context with reader Ambiguity often avoided Informal Many purposes: Entertain, connect, persuade… Narrow audience: Friends and colleagues Shared context already established Many statements are ambiguous out of social context NewswireText Social MediaText

Goals of analysis: Extract information about events from text “Understanding” text requires understanding “typical reader” conventions for communicating with him/her Prior knowledge, background, … Goals of analysis: Very diverse Evaluation is difficult And requires revisiting often as goals evolve Often “understanding” social text requires understanding a community NewswireText Social MediaText

Outline • Tools for analysis of text • Probabilistic models for text, communities, and time • Mixture models and LDA models for text • LDA extensions to model hyperlink structure • LDA extensions to model time • Alternative framework based on graph analysis to model time & community • Preliminary results & tradeoffs • Discussion of results & challenges

Introduction to Topic Models • Multinomial Naïve Bayes   C football ….. WN ….. W1 W2 W3 The Pittsburgh Steelers won M b b Box is shorthand for many repetitions of the structure….

Introduction to Topic Models • Multinomial Naïve Bayes   C politics ….. WN ….. W1 W2 W3 The Pittsburgh mayor stated M b b

Introduction to Topic Models • Naïve Bayes Model: Compact representation   C C ….. WN W1 W2 W3 W M N b M b

Introduction to Topic Models • Multinomial Naïve Bayes • For each document d = 1,, M • Generate Cd ~ Mult( ¢ | ) • For each position n = 1,, Nd • Generate wn ~ Mult(¢|,Cd)  C • For document d = 1 • Generate Cd ~ Mult( ¢ | ) = ‘football’ • For each position n = 1,, Nd=67 • Generate w1 ~ Mult(¢|,Cd) = ‘the’ • Generate w2= ‘Pittsburgh’ • Generate w3= ‘Steelers’ • …. ….. WN W1 W2 W3 M b

Introduction to Topic Models • Multinomial Naïve Bayes  • In the graphs: • shaded circles are known values • parents of variable W are the inputs to the function used in generating W. • Goal: given known values, estimate the rest, usually to maximize the probability of the observations: C ….. WN W1 W2 W3 M b

Introduction to Topic Models • Mixture model: unsupervised naïve Bayes model • Joint probability of words and classes: • But classes are not visible:  C Z W N M b

Introduction to Topic Models • Learning for naïve Bayes: • Take logs, the function is convex, linear and easy to optimize for any parameter • Learning for mixture model: • Many local maxima (at least one for each permutation of classes) • Expectation/maximization is most common method

Introduction to Topic Models • Mixture model: EM solution E-step: M-step:

Introduction to Topic Models • Mixture model: EM solution E-step: Estimate the expected values of the unknown variables (“soft classification”) M-step: Maximize the values of the parameters subject to this guess—usually, this is learning the parameter values given the “soft classifications”

Introduction to Topic Models

Introduction to Topic Models • Probabilistic Latent Semantic Analysis Model d d • Select document d ~ Mult() • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn)  Topic distribution z • Mixture model: • each document is generated by a single (unknown) multinomial distribution of words, the corpus is “mixed” by  • PLSA model: • each word is generated by a single unknown multinomial distribution of words, each document is mixed by d w N M 

Introduction to Topic Models JMLR, 2003

Introduction to Topic Models • Latent Dirichlet Allocation  • For each document d = 1,,M • Generate d ~ Dir(¢ | ) • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn) a z w N M 

Introduction to Topic Models • LDA’s view of a document

Introduction to Topic Models • LDA topics

Introduction to Topic Models • Latent Dirichlet Allocation • Overcomes some technical issues with PLSA • PLSA only estimates mixing parameters for training docs • Parameter learning is more complicated: • Gibbs Sampling: easy to program, often slow • Variational EM

Introduction to Topic Models • Perplexity comparison of various models Unigram Mixture model PLSA Lower is better LDA

Introduction to Topic Models • Prediction accuracy for classification using learning with topic-models as features Higher is better

Outline • Tools for analysis of text • Probabilistic models for text, communities, and time • Mixture models and LDA models for text • LDA extensions to model hyperlink structure • LDA extensions to model time • Alternative framework based on graph analysis to model time & community • Preliminary results & tradeoffs • Discussion of results & challenges

Hyperlink modeling using LDA

Hyperlink modeling using LinkLDA[Erosheva, Fienberg, Lafferty, PNAS, 2004] a  • For each document d = 1,,M • Generate d ~ Dir(¢ | ) • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn) • For each citation j = 1,, Ld • generate zj ~ Mult( . | d) • generate cj ~ Mult( . | zj) z z w c N L M  g Learning using variational EM

Hyperlink modeling using LDA[Erosheva, Fienberg, Lafferty, PNAS, 2004]

Goals of analysis: Extract information about events from text “Understanding” text requires understanding “typical reader” conventions for communicating with him/her Prior knowledge, background, … Goals of analysis: Very diverse Evaluation is difficult And requires revisiting often as goals evolve Often “understanding” social text requires understanding a community NewswireText Social MediaText Science as a testbed for social text: an open community which we understand

Amr Ahmed Eric Xing Models of hypertext for blogs [ICWSM 2008] Ramesh Nallapati me

LinkLDA model for citing documents Variant of PLSA model for cited documents Topics are shared between citing, cited Links depend on topics in two documents Link-PLSA-LDA

Experiments • 8.4M blog postings in Nielsen/Buzzmetrics corpus • Collected over three weeks summer 2005 • Selected all postings with >=2 inlinks or >=2 outlinks • 2248 citing (2+ outlinks), 1777 cited documents (2+ inlinks) • Only 68 in both sets, which are duplicated • Fit model using variational EM

Topics in blogs Model can answer questions like: which blogs are most likely to be cited when discussing topic z?

Topics in blogs Model can be evaluated by predicting which links an author will include in a an article Link-LDA Link-PLDA-LDA Lower is better

a   z  z z z w c w N N  Another model: Pairwise Link-LDA • LDA for both cited and citing documents • Generate an indicator for every pair of docs • Vs. generating pairs of docs • Link depends on the mixing components (’s) • stochastic block model

Pairwise Link-LDA supports new inferences… …but doesn’t perform better on link prediction

Outline • Tools for analysis of text • Probabilistic models for text, communities, and time • Mixture models and LDA models for text • LDA extensions to model hyperlink structure • Observation: these models can be used for many purposes… • LDA extensions to model time • Alternative framework based on graph analysis to model time & community • Discussion of results & challenges

Predicting Response to Political Blog Posts with Topic Models [NAACL ’09] Noah Smith Tae Yano

Political blogs and and comments Posts are often coupled with commentsections Comment style is casual, creative, less carefully edited 39

Political blogs and comments • Most of the text associated with large “A-list” community blogs is comments • 5-20x as many words in comments as in text for the 5 sites considered in Yano et al. • A large part of socially-created commentary in the blogosphere is comments. • Not blog  blog hyperlinks • Comments do not just echo the post

Modeling political blogs Our political blog model: CommentLDA z, z` = topic w = word (in post) w`= word (in comments) u = user D = # of documents; N = # of words in post; M = # of words in comments

CommentLDA Modeling political blogs Our proposed political blog model: LHS is vanilla LDA D = # of documents; N = # of words in post; M = # of words in comments

CommentLDA Modeling political blogs RHS to capture the generation of reaction separately from the post body Our proposed political blog model: Two chambers share the same topic-mixture Two separate sets of word distributions D = # of documents; N = # of words in post; M = # of words in comments

CommentLDA Modeling political blogs Our proposed political blog model: User IDs of the commenters as a part of comment text generate the words in the comment section D = # of documents; N = # of words in post; M = # of words in comments

CommentLDA Modeling political blogs Another model we tried: Took out the words from the comment section! The model is structurally equivalent to the LinkLDA from (Erosheva et al., 2004) This is a model agnostic to the words in the comment section! D = # of documents; N = # of words in post; M = # of words in comments

Topic discovery - Matthew Yglesias (MY) site 46

Comment prediction (MY) • LinkLDA and CommentLDA consistently outperform baseline models • Neither consistently outperforms the other. 20.54 % Comment LDA (R) (RS) (CB) 16.92 % 32.06 % Link LDA (R) Link LDA (C) user prediction:Precision at top 10 From left to right: Link LDA(-v, -r,-c) Cmnt LDA (-v, -r, -c), Baseline (Freq, NB) 49

From Episodes to Sagas: Temporally Clustering News Via Social-Media Commentary [current work] Noah Smith Frank Lin Matthew Hurst Ramnath Balasubramanyan

Motivation • News-related blogosphere is driven by recency • Some recent news is better understood based on context of sequence of related stories • Some readers have this context – some don’t • To reconstruct the context, reconstruct the sequence of related stories (“saga”) • Similar to retrospective event detection • First efforts: • Find related stories • Cluster by time • Evaluation: agreement with human annotators

Clustering results on Democratic-primary-related documents k-walks (more later) SpeCluster + time: Mixture of multinomials + model for “general” text + timestamp from Gaussian

Predictively Modeling Social Text