1 / 73

Predictively Modeling Social Text

Predictively Modeling Social Text. William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon University

zarita
Download Presentation

Predictively Modeling Social Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predictively Modeling Social Text William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon University Joint work with: Amr Ahmed, Andrew Arnold, Ramnath Balasubramanyan, Frank Lin, Matt Hurst (MSFT), Ramesh Nallapati, Noah Smith, Eric Xing, Tae Yano

  2. Formal Primary purpose: Inform “typical reader” about recent events Broad audience: Explicitly establish shared context with reader Ambiguity often avoided Informal Many purposes: Entertain, connect, persuade… Narrow audience: Friends and colleagues Shared context already established Many statements are ambiguous out of social context NewswireText Social MediaText

  3. Goals of analysis: Extract information about events from text “Understanding” text requires understanding “typical reader” conventions for communicating with him/her Prior knowledge, background, … Goals of analysis: Very diverse Evaluation is difficult And requires revisiting often as goals evolve Often “understanding” social text requires understanding a community NewswireText Social MediaText

  4. Outline • Tools for analysis of text • Probabilistic models for text, communities, and time • Mixture models and LDA models for text • LDA extensions to model hyperlink structure • LDA extensions to model time • Alternative framework based on graph analysis to model time & community • Preliminary results & tradeoffs • Discussion of results & challenges

  5. Introduction to Topic Models • Multinomial Naïve Bayes   C football ….. WN ….. W1 W2 W3 The Pittsburgh Steelers won M b b Box is shorthand for many repetitions of the structure….

  6. Introduction to Topic Models • Multinomial Naïve Bayes   C politics ….. WN ….. W1 W2 W3 The Pittsburgh mayor stated M b b

  7. Introduction to Topic Models • Naïve Bayes Model: Compact representation   C C ….. WN W1 W2 W3 W M N b M b

  8. Introduction to Topic Models • Multinomial Naïve Bayes • For each document d = 1,, M • Generate Cd ~ Mult( ¢ | ) • For each position n = 1,, Nd • Generate wn ~ Mult(¢|,Cd)  C • For document d = 1 • Generate Cd ~ Mult( ¢ | ) = ‘football’ • For each position n = 1,, Nd=67 • Generate w1 ~ Mult(¢|,Cd) = ‘the’ • Generate w2= ‘Pittsburgh’ • Generate w3= ‘Steelers’ • …. ….. WN W1 W2 W3 M b

  9. Introduction to Topic Models • Multinomial Naïve Bayes  • In the graphs: • shaded circles are known values • parents of variable W are the inputs to the function used in generating W. • Goal: given known values, estimate the rest, usually to maximize the probability of the observations: C ….. WN W1 W2 W3 M b

  10. Introduction to Topic Models • Mixture model: unsupervised naïve Bayes model • Joint probability of words and classes: • But classes are not visible:  C Z W N M b

  11. Introduction to Topic Models • Learning for naïve Bayes: • Take logs, the function is convex, linear and easy to optimize for any parameter • Learning for mixture model: • Many local maxima (at least one for each permutation of classes) • Expectation/maximization is most common method

  12. Introduction to Topic Models • Mixture model: EM solution E-step: M-step:

  13. Introduction to Topic Models • Mixture model: EM solution E-step: Estimate the expected values of the unknown variables (“soft classification”) M-step: Maximize the values of the parameters subject to this guess—usually, this is learning the parameter values given the “soft classifications”

  14. Introduction to Topic Models

  15. Introduction to Topic Models • Probabilistic Latent Semantic Analysis Model d d • Select document d ~ Mult() • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn)  Topic distribution z • Mixture model: • each document is generated by a single (unknown) multinomial distribution of words, the corpus is “mixed” by  • PLSA model: • each word is generated by a single unknown multinomial distribution of words, each document is mixed by d w N M 

  16. Introduction to Topic Models JMLR, 2003

  17. Introduction to Topic Models • Latent Dirichlet Allocation  • For each document d = 1,,M • Generate d ~ Dir(¢ | ) • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn) a z w N M 

  18. Introduction to Topic Models • LDA’s view of a document

  19. Introduction to Topic Models • LDA topics

  20. Introduction to Topic Models • Latent Dirichlet Allocation • Overcomes some technical issues with PLSA • PLSA only estimates mixing parameters for training docs • Parameter learning is more complicated: • Gibbs Sampling: easy to program, often slow • Variational EM

  21. Introduction to Topic Models • Perplexity comparison of various models Unigram Mixture model PLSA Lower is better LDA

  22. Introduction to Topic Models • Prediction accuracy for classification using learning with topic-models as features Higher is better

  23. Outline • Tools for analysis of text • Probabilistic models for text, communities, and time • Mixture models and LDA models for text • LDA extensions to model hyperlink structure • LDA extensions to model time • Alternative framework based on graph analysis to model time & community • Preliminary results & tradeoffs • Discussion of results & challenges

  24. Hyperlink modeling using LDA

  25. Hyperlink modeling using LinkLDA[Erosheva, Fienberg, Lafferty, PNAS, 2004] a  • For each document d = 1,,M • Generate d ~ Dir(¢ | ) • For each position n = 1,, Nd • generate zn ~ Mult( ¢ | d) • generate wn ~ Mult( ¢ | zn) • For each citation j = 1,, Ld • generate zj ~ Mult( . | d) • generate cj ~ Mult( . | zj) z z w c N L M  g Learning using variational EM

  26. Hyperlink modeling using LDA[Erosheva, Fienberg, Lafferty, PNAS, 2004]

  27. Goals of analysis: Extract information about events from text “Understanding” text requires understanding “typical reader” conventions for communicating with him/her Prior knowledge, background, … Goals of analysis: Very diverse Evaluation is difficult And requires revisiting often as goals evolve Often “understanding” social text requires understanding a community NewswireText Social MediaText Science as a testbed for social text: an open community which we understand

  28. Amr Ahmed Eric Xing Models of hypertext for blogs [ICWSM 2008] Ramesh Nallapati me

  29. LinkLDA model for citing documents Variant of PLSA model for cited documents Topics are shared between citing, cited Links depend on topics in two documents Link-PLSA-LDA

  30. Experiments • 8.4M blog postings in Nielsen/Buzzmetrics corpus • Collected over three weeks summer 2005 • Selected all postings with >=2 inlinks or >=2 outlinks • 2248 citing (2+ outlinks), 1777 cited documents (2+ inlinks) • Only 68 in both sets, which are duplicated • Fit model using variational EM

  31. Topics in blogs Model can answer questions like: which blogs are most likely to be cited when discussing topic z?

  32. Topics in blogs Model can be evaluated by predicting which links an author will include in a an article Link-LDA Link-PLDA-LDA Lower is better

  33. a   z  z z z w c w N N  Another model: Pairwise Link-LDA • LDA for both cited and citing documents • Generate an indicator for every pair of docs • Vs. generating pairs of docs • Link depends on the mixing components (’s) • stochastic block model

  34. Pairwise Link-LDA supports new inferences… …but doesn’t perform better on link prediction

  35. Outline • Tools for analysis of text • Probabilistic models for text, communities, and time • Mixture models and LDA models for text • LDA extensions to model hyperlink structure • Observation: these models can be used for many purposes… • LDA extensions to model time • Alternative framework based on graph analysis to model time & community • Discussion of results & challenges

  36. Predicting Response to Political Blog Posts with Topic Models [NAACL ’09] Noah Smith Tae Yano

  37. Political blogs and and comments Posts are often coupled with commentsections Comment style is casual, creative, less carefully edited 39

  38. Political blogs and comments • Most of the text associated with large “A-list” community blogs is comments • 5-20x as many words in comments as in text for the 5 sites considered in Yano et al. • A large part of socially-created commentary in the blogosphere is comments. • Not blog  blog hyperlinks • Comments do not just echo the post

  39. Modeling political blogs Our political blog model: CommentLDA z, z` = topic w = word (in post) w`= word (in comments) u = user D = # of documents; N = # of words in post; M = # of words in comments

  40. CommentLDA Modeling political blogs Our proposed political blog model: LHS is vanilla LDA D = # of documents; N = # of words in post; M = # of words in comments

  41. CommentLDA Modeling political blogs RHS to capture the generation of reaction separately from the post body Our proposed political blog model: Two chambers share the same topic-mixture Two separate sets of word distributions D = # of documents; N = # of words in post; M = # of words in comments

  42. CommentLDA Modeling political blogs Our proposed political blog model: User IDs of the commenters as a part of comment text generate the words in the comment section D = # of documents; N = # of words in post; M = # of words in comments

  43. CommentLDA Modeling political blogs Another model we tried: Took out the words from the comment section! The model is structurally equivalent to the LinkLDA from (Erosheva et al., 2004) This is a model agnostic to the words in the comment section! D = # of documents; N = # of words in post; M = # of words in comments

  44. Topic discovery - Matthew Yglesias (MY) site 46

  45. Topic discovery - Matthew Yglesias (MY) site 47

  46. Topic discovery - Matthew Yglesias (MY) site 48

  47. Comment prediction (MY) • LinkLDA and CommentLDA consistently outperform baseline models • Neither consistently outperforms the other. 20.54 % Comment LDA (R) (RS) (CB) 16.92 % 32.06 % Link LDA (R) Link LDA (C) user prediction:Precision at top 10 From left to right: Link LDA(-v, -r,-c) Cmnt LDA (-v, -r, -c), Baseline (Freq, NB) 49

  48. From Episodes to Sagas: Temporally Clustering News Via Social-Media Commentary [current work] Noah Smith Frank Lin Matthew Hurst Ramnath Balasubramanyan

  49. Motivation • News-related blogosphere is driven by recency • Some recent news is better understood based on context of sequence of related stories • Some readers have this context – some don’t • To reconstruct the context, reconstruct the sequence of related stories (“saga”) • Similar to retrospective event detection • First efforts: • Find related stories • Cluster by time • Evaluation: agreement with human annotators

  50. Clustering results on Democratic-primary-related documents k-walks (more later) SpeCluster + time: Mixture of multinomials + model for “general” text + timestamp from Gaussian

More Related