1.12k likes | 1.25k Views
Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms . Amr Ahmed Thesis Proposal. This thesis is about Document collections they are everywhere they cover many domains. ArXiv. Conference proceeding.
E N D
Modeling Users and Content:Structured Probabilistic Representationand Scalable Online Inference Algorithms Amr Ahmed Thesis Proposal
This thesis is about Document collections they are everywhere they cover many domains
ArXiv Conference proceeding Research Publications Pubmed central Journal transactions Yahoo! news CNN Red state Social Media Blogs Google news Daily KOS BBC
Structural Correspondence Temporal Dynamics Phy Bio CS time time BP: “We will make this right." Drill explosion “BP wasn't prepared for an oil spill at such depths” Choice is a fundamental, constitutional right Ban abortion with Constitutional amendment
Thesis Question • How to build a structured representation of document collections that reveals • Temporal Dynamics • How ideas/events evolve over time • Structural Correspondence • How ideas are addressed across modalities and communities
Thesis Approach • Models • Probabilistic graphical models • Topic models and Non-parametric Bayes • Principled, expressive and modular • Algorithms • Distributed • To deal with large-scale datasets • Online • To update the representation with new data
Outline • Background • Temporal Dynamics • Timelines for research publications • Storylines form news stream • User interest-lines • Structural Correspondence • Across modalities • Across ideologies
What is a Good Model for Documents? • Clustering • Mixture of unigram model • How to specify a model? • Generative process • Assume some hidden variables • Use them to generate documents • Inference • Invert the process • Given documents hidden variables f p K ci wi N
Mixture of Unigram f1 fk f p K ci wi N pj pk p1 wi Generative Process Is this a good model for documents? • For Document wi • Sample ci ~ Multi(p) • Sample wi~Mult(fci) When is this a good model for documents? • When documents are single-topic • Not true in our settings
0.6 0.3 0.1 MT Syntax Learning Source Target SMT Alignment Score BLEU Parse Tree Noun Phrase Grammar CFG likelihood EM Hidden Parameters Estimation argMax What Do We Need to Model? • Q: What is it about? • A: Mainly MT, with syntax, some learning A Hierarchical Phrase-Based Model for Statistical Machine Translation We present a statistical phrase-based Translation model that uses hierarchical phrases—phrases that contain sub-phrases. The model is formally a synchronous context-free grammar but is learned from a bitext without any syntactic information. Thus it can be seen as a shift to the formal machinery of syntax based translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system. Mixing Proportion Topics Unigram over vocabulary Topic Models
Mixed-Membership Models Prior f1 fk q Generative Process • For each document d • Sample qd~Prior • For each word w in d • Sample z~Multi(qd) • Sample w~Multi(fz) z f w K N D qj qk q1 wi A Hierarchical Phrase-Based Model for Statistical Machine Translation We present a statistical phrase-based Translation model that uses hierarchical phrases. Thus it can be seen as a shift to the formal machinery of syntax based translation systems without any linguistic commitment. In our experiments using BLEU as a metric, the hierarchical Phrase based model achieves a relative Improvement of 7.5% over Pharaoh, a state-of-the-art phrase-based system.
Topic Models Prior • Prior over topic Vector • Latent Dirichlet Allocation (LDA) • Correlated priors (CTM) • Hierarchical priors • Topics • Unigram, bigrams, etc • Document structure • Bag of words • Multi-modal • Side information q z f w K N D
Outline • Background • Temporal Dynamics • Timelines for research publications • Storylines form news stream • User interest-lines • Structural Correspondence • Across modalities • Across ideologies
Problem Statement Phy Bio • Potentially infinitenumber of topics • With time-varying trends • And time-varying distributions • And variable durations • Topics can die • New topics can be born Topics CS Research Papers 2009 1900 given Discover
Time Model Dimension The Big Picture LDA Dynamic clustering Dynamic LDA a q z HDPM InfiniteDynamic Topic Models K f w N D
LDA: The Generative Process a Generative Process • For each document d • Sample qd~Dirichlet(a) • For each word w in d • Sample z~Multi(qd) • Sample w~Multi(fz) q z w N D f K Topics’ trends evolve over time? Topics’ distributions evolve over time? Number of topics grow with the data?
Model Dimension The Big Picture Time LDA Dynamic clustering Dynamic LDA a q z HDPM InfiniteDynamic Topic Models K b w N D
Dynamic LDA: The Generative Process a1 Necessary to evolve trends q • For each document d • Sample qd~Normal(a,lI) • For each word w in d • Sample z ~ Multi(L(qd)) • Sample w~Multi(L(fz)) z w N D f1 K Research Papers 2009 1900 Logistic transformation:
Dynamic LDA: The Generative Process a1 a2 q q • at ~ Normal(.|at -1,s) • Fk,t ~Normal(.|Fk,t,r) • For each document d • Sample qd~Normal(at ,lI) • For each word w in d • Sample zd,i~Multi(L(qd)) • Sample wd,i~Multi(L(fz(d,i))) z z w w N N D D f1 f2 K K Research Papers 2009 1900
Dynamic LDA: The Generative Process a1 a2 aT q q q z z z w w w N N N D D D f1 f2 fT K K K Research Papers 2009 1900
Dynamic LDA: The Generative Process a1 a2 aT q q q z z z w w w N N N D D D f1 f2 fT K K K Topics’ trends evolve over time? Topics’ distributions evolve over time? Number of topics grow with the data?
Model Dimension The Big Picture Time LDA Dynamic clustering Dynamic LDA a q z HDPM InfiniteDynamic Topic Models K b w N D
The Chinese Restaurant Franchise Process • HDPM automatically determines number of topics in LDA • We will focus on the Chinese Restaurant Franchise process construction • A set of restaurants that share a global menu • Metaphor • Restaurant = documents • Customer = word • Dish = topic • Global Menu = Set of topics
The Chinese Restaurant Franchise Process Global Menu m1: Number of tables serving this dish (topic) f2 f1 f3 f4 f4: distribution for topic 4 Restaurant 1 Restaurant 2 Table Customers Sharing the same dish Customers Sharing the same dish Dish served
The Chinese Restaurant Franchise Process Global Menu f1 f2 f3 f4 ? Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b a • Sample a new dish for this table a
The Chinese Restaurant Franchise Process Global Menu w~ Multi(L(f3)) f1 f2 f3 f4 Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b a • Sample a new dish for this table ? a
The Chinese Restaurant Franchise Process Global Menu f1 f2 f3 f4 Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b a • Sample a new dish for this table ? a
The Chinese Restaurant Franchise Process Global Menu g f1 f2 new f3 f4 ? Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b a • Sample a new dish for this table • Existing dish k mk • A new dish g ? a
The Chinese Restaurant Franchise Process Global Menu w~ Multi(L(f3)) ? g f1 f2 new f3 f4 Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b a • Sample a new dish for this table • Existing dish k mk • A new dish g a
The Chinese Restaurant Franchise Process Global Menu f5~ H ? f5 f1 f2 new f3 f4 ? w~ Multi(L(f5)) Restaurant 3 Restaurant 1 Restaurant 2 Generative Process • For customer w in restaurant 3 • Choose table j Nj • Choose a new table b a • Sample a new dish for this table • Existing dish k mk • A new dish g a
The Chinese Restaurant Franchise Process Global Menu f5 f1 f2 f3 f4 Restaurant 3 Restaurant 1 Restaurant 2 Topics’ trends evolve over time? Topics’ distributions evolve over time? Number of topics grow with the data?
Model Dimension The Big Picture Time LDA Dynamic clustering Dynamic LDA a q z HDPM InfiniteDynamic Topic Models K b w N D
Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 f5,1 f1,1 = * f2,1 f3,1 f4,1 Pseudo counts Decay factor Epoch 1 Topics at end of epoch 1 • Height (mk,1) represent topic popularity • fk,1 represents topic’s k distribution Documents in epoch 1 are generated as before Observations • Popular topics at epoch 1 are likely to be popular at epoch 2 • fk,2 is likely to smoothly evolve from fk,1
Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 New real dish served f3,2 ~ Normal(.| f3,1,r) f5,1 f2,2 f1,1 f3,2 f2,1 f3,1 f4,1 Epoch 1 Inherited but not yet used
Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 f5,1 f1,1 f2,1 f3,1 f4,1 f2,2 f3,2 Epoch 1 Generative Process • For customer w in restaurant 1 • [as in static case] Choose table j Nj • Choose a new table b a • Sample a new dish for this table • Existing and inherited dish k m`k,2 + mk,2 • Existing but NOT inherited dish k m`k,2 Thenfk,2 ~ Normal(.| fk,1,r) • A new dish g Thenfnew~ H
Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 f5,1 f1,1 f2,1 f3,1 f4,1 f2,2 f3,2 Epoch 1 Generative Process • For customer w in restaurant 1 • [as in static case] Choose table j Nj • Choose a new table b a • Sample a new dish for this table • Existing and inherited dish k m`k,2 + mk,2 • Existing but NOT inherited dish k m`k,2 Thenfk,2 ~ Normal(.| fk,1,r) • A new dish g Thenfnew~ H
Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 f5,1 f1,1 f2,1 f3,1 f4,1 f1,2 f2,2 f3,2 f1,2 ~ Normal(.| f1,1,r) Epoch 1 Generative Process • For customer w in restaurant 1 • [as in static case] Choose table j Nj • Choose a new table b a • Sample a new dish for this table • Existing and inherited dish k m`k,2 + mk,2 • Existing but NOT inherited dish k m`k,2 Thenfk,2 ~ Normal(.| fk,1,r) • A new dish g Thenfnew~ H
Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 f5,1 f1,1 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 f6,2 ~ H Epoch 1 Generative Process • For customer w in restaurant 1 • [as in static case] Choose table j Nj • Choose a new table b a • Sample a new dish for this table • Existing and inherited dish k m`k,2 + mk,2 • Existing but NOT inherited dish k m`k,2 Thenfk,2 ~ Normal(.| fk,1,r) • A new dish g Thenfnew~ H
Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f1,2 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 Epoch 1 Epoch 2 died out topics Newly born
Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f1,2 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 Epoch 1 Epoch 2 Topics’ trends evolve over time? Topics’ distributions evolve over time? Number of topics grow with the data?
Recurrent Chinese Restaurant Franchise Process Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f1,2 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 Epoch 1 Epoch 2 • We just described a first order RCRF process • for a general D-order process
Inference • Gibbs Sampling • Sample a table for each word • Sample a topic for each table • Sample the topic parameter over time • Sample hyper-parameters • How to deal with non-conjugacy • Algorithm 8 in Neal’s 1998 + Metropolis-Hasting • Efficiency • The Markov blanket contains the previous and following D epochs
Sampling a Topic for a Table Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f1,2 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 Past Emission Future Non-Conjugacy Efficiency
Sampling a Topic for a Table Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f2,1 f3,1 f4,1 g/3 f2,2 f1,2 f3,2 f 6,2 ~ H= N(0,sI) Past Emission Future Non-Conjugacy Efficiency
Sampling a Topic for a Table Global Menu T=1 Global Menu T=2 Global Menu T=3 f5,1 f1,1 f1,2 f2,1 f3,1 f4,1 f 6,2 f2,2 f3,2 Past Emission Future Non-Conjugacy Pre-compute And update
Sampling Topic Parameters f1 f2 fT • V|f ~ Mult( Logistic(f)) • Linear-State space model with non-Gaussian emission • Use Laplace approximation inside the Forward-Backward algorithm • Use the resulting distribution as a proposal v v v
Experiments • Simulated data • Simulated 20 epochs with 100 data points in each epoch • Timeline of the NIPS conference • 13 years • 1740 documents • 950 words per document • ~3500 vocabulary
Simulation Experiment Sample Documents:
Ground Truth Recovered
SOM 1987 1991 1990 1995 1994 1996 ICA boosting speech RL Memory Neuro sience Bayesian Kernels Mixtures NN Generalizatoin Classification Classification Clustering Methods Control Control PM Prob. Models image speech Kernels Mixtures ICA