1 / 146

Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms

Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms . Amr Ahmed Thesis Defense. The Infosphere. News Sources. Social Media. Research Publications. President Obama had an accident while playing a basketball match.

hewitt
Download Presentation

Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Users and Content:Structured Probabilistic Representationand Scalable Online Inference Algorithms Amr Ahmed Thesis Defense

  2. The Infosphere

  3. News Sources Social Media Research Publications President Obama had an accident while playing a basketball match President Obama had an accident while playing a basketball match President Obama had an accident while playing a basketball match The Infosphere

  4. The Infosphere President Obama had an accident while playing a basketball match President Obama had an accident while playing a basketball match Soccer Online inference Car deals Fashion

  5. News Sources Social Media Research Publications Thesis question

  6. News Sources Social Media Research Publications President Obama had an accident while playing a basketball match President Obama had an accident while playing a basketball match President Obama had an accident while playing a basketball match How to model users and content? President Obama had an accident while playing a basketball match President Obama had an accident while playing a basketball match President Obama had an accident while playing a basketball match Online inference Car deals Soccer Fashion

  7. Questions What do we mean by Content? What characterizes user and content?

  8. ArXiv Conference proceeding Research Publications Pubmed central Journal transactions Yahoo! news CNN Red state Social Media Blogs Google news Daily KOS BBC

  9. Multi-faceted nature Temporal dynamics Phy Bio CS time time BP: “We will make this right." Drill explosion “BP wasn't prepared for an oil spill at such depths” Choice is a fundamental, constitutional right Ban abortion with Constitutional amendment

  10. What Characterizes Users? • Long-term interests • Baseball • Graphical models • Music • Short-term interests • Buying a car • Getting a new camera • Spurious interests • What is the buzz about the oil spill?

  11. Thesis Question • How to build a structured representation of Users and Content • Temporal Dynamics • How ideas/events evolve over time • How user interest change over time • Structural Correspondence • How ideas are addressed across modalities and communities • How to learn user interest from multimodal sources

  12. Thesis Approach • Models • Probabilistic graphical models • Topic models and Non-parametric Bayes • Principled, expressive and modular • Algorithms • Distributed • To deal with large-scale datasets • Online • To update the representation with new data

  13. Outline • Background • Mixed-membership Models • Recurrent Chinese Restaurant Process • Modeling Temporal Dynamics • News • Research publications • User intents • Modeling multi-faceted Content • Ideological Perspective

  14. What is a Good Model for Documents? • Clustering • Mixture of unigram model • How to specify a model? • Generative process • Assume some hidden variables • Use them to generate documents • Inference • Invert the process • Given documents  hidden variables f p K ci wi N

  15. Mixture of Unigram f1 fk f p K ci wi N  pj  pk p1 wi Generative Process Is this a good model for documents? • For Document wi • Sample ci ~ Multi(p) • Sample wi~Mult(fci) When is this a good model for documents? • When documents are single-topic • Not true in our settings

  16. 0.6 0.3 0.1 MT Syntax Learning Source Target SMT Alignment Score BLEU Parse Tree Noun Phrase Grammar CFG likelihood EM Hidden Parameters Estimation argMax What Do We Need to Model? • Q: What is it about? • A: Mainly MT, with syntax, some learning A Hierarchical Phrase-Based Model for Statistical Machine Translation We present a statistical phrase-based Translation model that uses hierarchical phrases—phrases that contain sub-phrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of syntax based translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system. Mixing Proportion Topics Unigram over vocabulary Topic Models

  17. Mixed-Membership Models Prior f1 fk q Generative Process • For each document d • Sample qd~Prior • For each word w in d • Sample z~Multi(qd) • Sample w~Multi(fz) z f w K N D  qj  qk q1 wi A Hierarchical Phrase-Based Model for Statistical Machine Translation We present a statistical phrase-based Translation model that uses hierarchical phrases. Thus it can be seen as a shift to the formal machinery of syntax based translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system.

  18. Outline • Background • Mixed-membership Models • Recurrent Chinese Restaurant Process • Modeling Temporal Dynamics • Research publications • News • User intents • Modeling multi-faceted Content • Ideological Perspective

  19. Chinese Restaurant Process (CRP) • Allows the number of mixtures to grow with the data • Also called non-parametric • Means the number of effective parameters grow with data • Still have hyper-parametersthat control the rate of growth • a:how fast a new cluster/mixture is born? • G0: Prior over mixture component parameters

  20. The Chinese Restaurant Process f1 f2 f3 Generative Process • For data point xi • Choose table j Njand Sample xi ~ f(fj) • Choose a new table K+1  a • Sample fK+1 ~ G0 and Sample xi ~ f(fK+1) The rich gets richer effect CANNOT handle sequential data

  21. Recurrent CRP (RCRP) [Ahmed and Xing 2008] • Adapts the number of mixture components over time • Mixture components can die out • New mixture components are born at any time • Retained mixture components parametersevolve according to a Markovian dynamics

  22. Recurrent CRP (RCRP) • Three equivalent constructions (see [Ahmed & Xing 2008]) Infinite limit of fixed-dimensional dynamic model. Recurrent Chinese Restaurant Time-dependent random measures

  23. The Recurrent Chinese Restaurant Process • The restaurant operates in epochs • The restaurant is closed at the end of each epoch • The state of the restaurant at time epoch tdepends on that at time epoch t-1 • Can be extended to higher-order dependencies.

  24. The Recurrent Chinese Restaurant Process T=1 Dish eaten at table 3 at time epoch 1 OR the parameters of cluster 3 at time epoch 1 f1,1 f2,1 f3,1 Generative Process • Customers at time T=1 are seated as before: • Choose table j Nj,1 and Sample xi ~ f(fj,1) • Choose a new table K+1 a • Sample fK+1,1 ~ G0 and Sample xi ~ f(fK+1,1)

  25. The Recurrent Chinese Restaurant Process f1,1 f1,1 f2,1 f2,1 f3,1 f3,1 T=1 N2,1=3 N3,1=1 N1,1=2 T=2

  26. T=1 f1,1 f2,1 f3,1 f1,1 f2,1 f3,1 T=2 N2,1=3 N3,1=1 N1,1=2

  27. T=1 f1,1 f2,1 f3,1 f1,1 f2,1 f3,1 T=2 N2,1=3 N3,1=1 N1,1=2

  28. T=1 f1,1 f2,1 f3,1 f1,1 f2,1 f3,1 T=2 N2,1=3 N3,1=1 N1,1=2

  29. T=1 f1,2 f2,1 f3,1 f1,1 f2,1 f3,1 T=2 N2,1=3 N3,1=1 N1,1=2 Sample f1,2 ~ P(.| f1,1)

  30. T=1 f1,2 f2,1 f3,1 f1,1 f2,1 f3,1 T=2 N2,1=3 N3,1=1 N1,1=2 And so on ……

  31. T=1 f1,2 f2,2 f3,1 f1,1 f2,1 f3,1 f4,2 T=2 N2,1=3 N3,1=1 N1,1=2 Died out cluster Newly born cluster At the end of epoch 2

  32. T=1 f1,1 f2,1 f3,1 f1,2 f1,2 f2,2 f2,2 f3,1 f4,2 f4,2 T=2 N2,1=3 N3,1=1 N1,1=2 N1,2=2 N2,2=2 N4,2=1 T=3

  33. æ ö - w W å ç ÷ e N l - k , t w è ø = w 1 RCRP • Can be extended to model higher-order dependencies • Can decay dependencies over time • Pseudo-counts for table k at time t is History size Number of customers sitting at table K at time epoch t-w Decay factory

  34. T=1 f1,1 f2,1 f3,1 f1,2 f1,2 f2,2 f2,2 f3,1 f4,2 f4,2 T=2 N2,1=3 N3,1=1 N1,1=2 N2,3 T=3 æ ö - w W å ç ÷ e N l N2,3 = - k , t w è ø = w 1

  35. RCRP • Can be extended to model higher-order dependencies • Can decay dependencies over time • Pseudo-counts for table k at time t is • (W, l, a) can generate interesting clustering configurations

  36. TDPM Generative Power DPM W=T l =  Power-law curve TDPM W=4 l = .4 Independent DPMs W= 0 l = ? (any)

  37. Outline • Background • Mixed-membership Models • Recurrent Chinese Restaurant Process • Modeling Temporal Dynamics • News • User intents • Research publications • Modeling multi-faceted Content • Ideological Perspective

  38. Modeling Temporal Dynamics RCRP Infinite storylines from streaming text Evolution of research ideas Online scalable inference Dynamic user interests Online distributed inference

  39. Outline • Background • Mixed-membership Models • Recurrent Chinese Restaurant Process • Modeling Temporal Dynamics • News • User intents • Research publications • Modeling multi-faceted Content • Ideological Perspective

  40. Understanding the News • Clustering • Group similar articles together • Classification • High-level topics like sports and politics • Analysis • How a story develops over time • Who are the main entities • Challenges • Large scale and online • Almost one document per second

  41. A Unified Model • Jointly solves the three main tasks • Clustering, • Classification • Analysis • Building blocks • A Topic model • High-level concepts (unsupervised classification) • Dynamic clustering (RCRP) • Discover tightly-focused concepts • Named entities • Story developments

  42. Dynamic Clustering • Recurrent Chinese restaurant process (RCRP) • Discovers time-sensitive stories Generative Process • For each document wd at time t • Sample wd ~ Multinomial(bs) priors Stories’ trend + prior at time t

  43. Infinite Dynamic Cluster-Topic Hybrid Politics Government Minister Authorities Opposition Officials Leaders group Accidents Police Attack run man group arrested move Sports games Won Team Final Season League held UEFA-soccer Tax-Bill Champions Goal Coach Striker Midfield penalty Juventus AC Milan Lazio Ronaldo Lyon    Tax Billion Cut Plan Budget Economy Bush Senate Fleischer White House Republican g

  44. Infinite Dynamic Cluster-Topic Hybrid Politics Government Minister Authorities Opposition Officials Leaders group Accidents Police Attack run man group arrested move Sports games Won Team Final Season League held UEFA-soccer Tax-Bill Border-Tension Champions Goal Coach Striker Midfield penalty Juventus AC Milan Lazio Ronaldo Lyon    Tax Billion Cut Plan Budget Economy Nuclear Border Dialogue Diplomatic militant Insurgency missile Bush Senate Fleischer White House Republican Pakistan India Kashmir New Delhi Islamabad Musharraf Vajpayee g

  45. The Graphical Model Tightly-focuses High-level concepts

  46. The Graphical Model Tightly-focuses High-level concepts

  47. The Graphical Model • Each story has: • Distribution over words • Distribution over topics • Distribution over named entites

  48. The Graphical Model • Document’s mixing-vector is sampled from its story prior • Words inside a document can either come form global topics or the story specific topic

  49. The Generative Process

  50. The Generative Process

More Related