580 likes | 901 Views
Topic modeling. Mark Steyvers Department of Cognitive Sciences University of California, Irvine. Some topics we can discuss. Introduction to LDA: basic topic model Preliminary work on therapy transcripts Extensions to LDA Conditional topic models (for predicting behavioral codes)
E N D
Topic modeling Mark Steyvers Department of Cognitive Sciences University of California, Irvine
Some topics we can discuss • Introduction to LDA: basic topic model • Preliminary work on therapy transcripts • Extensions to LDA • Conditional topic models (for predicting behavioral codes) • Various topic models for word order • Topic models incorporating parse trees • Topic models for dialogue • Topic models incorporating speech information
Automatic and unsupervised extraction of semantic themes from large text collections. Pennsylvania Gazette (1728-1800) 80,000 articles Enron 250,000 emails NYT 330,000 articles NSF/ NIH 100,000 grants AOL queries 20,000,000 queries 650,000 users 16 million Medline articles
Doc1 Doc2 Doc3 … PIZZA 34 0 3 PASTA 12 0 2 ITALIAN 0 19 6 FOOD … 0 … 16 … 1 … Model Input • Matrix of counts: number of times words occur in documents • Note: • word order is lost: “bag of words” approach • Some function words are deleted: “the”, “a”, “in” documents words
Basic Assumptions • Each topic is a distribution over words • Each document a mixture of topics • Each word in a document originates from a single topic
Document = mixture of topics auto car parts cars used ford honda truck toyota party store wedding birthday jewelry ideas cards cake gifts webmd cymbalta xanax gout vicodin effexor prednisone lexapro ambien hannah montana zac efron disney high school musical mileycyrus hilary duff 20% Document ------------------------------- -------------------------------------------------------------- --------------------------------------------------------------------------------------- 80% 100% Document ------------------------------- -------------------------------------------------------------- ---------------------------------------------------------------------------------------
Generative Process • For each document, choose a mixture of topics • Dirichlet() • Sample a topic [1..T] from the mixturez Multinomial() • Sample a word from the topicw Multinomial((z)) Dirichlet(β) Nd D T
Prior Distributions • Dirichlet priors encourage sparsity on topic mixtures and topics Topic 3 Word 3 Topic 1 Topic 2 Word 1 Word 2 θ~ Dirichlet( α ) ~ Dirichlet( β) (darker colors indicate lower probability)
Statistical Inference • Three sets of latent variables: • document-topic distributions θ • topic-word distributions • topic assignments z • Estimate posterior distribution over topic assignments • P( z | w ) • we “collapse” over topic mixtures and word mixtures • we can later infer θand • Use approximate methods: Markov chain Monte Carlo (MCMC) with Gibbs sampling
Toy Example: Artificial Dataset Two topics 16 documents Docs Can we recover the original topics and topic mixtures from this data?
Initialization: assign word tokens randomly to topics: (●=topic 1; ○=topic 2 )
Gibbs Sampling count of topic t assigned to doc d count of word w assigned to topic t probability that word iis assigned to topic t
After 1 iteration • Apply sampling equation to each word token: (●=topic 1; ○=topic 2 )
After 4 iterations (●=topic 1; ○=topic 2 )
After 8 iterations (●=topic 1; ○=topic 2 )
After 32 iterations (●=topic 1; ○=topic 2 )
Summary of Algorithm INPUT: word-document counts (word order is irrelevant) OUTPUT: topic assignments to each word P( zi ) likely words in each topic P( w | z ) likely topics in each document (“gist”) P( z | d )
Example topics from TASA: an educational corpus • 37K docs 26K word vocabulary • 300 topics e.g.:
Three documents with the word “play”(numbers & colors topic assignments)
LSA documents dims dims documents C = U D VT dims words words dims Topic model documents topics documents C = F Q topics words words normalized co-occurrence matrix mixture components mixture weights
Documents as Topics Mixtures:a Geometric Interpretation P(word1) 1 topic 1 = observeddocument 0 topic 2 1 P(word2) P(word3) 1 P(word1)+P(word2)+P(word3) = 1
Defining documents • Can define “document” in multiple ways • all words within a therapy session • all words from a particular speaker within a session • Clearly we need to extend topic model to dialogue….
Positive/ Negative Topic Usage by Changes in Satisfaction This graph shows that couples with a decrease in satisfaction over the course of therapy use relatively negative language. Those who leave the therapy with increased satisfaction exhibit more positive language
Topics used by Satisfied/ Unsatisfied Couples Topic 38 talk divorce problem house along separate separation talking agree example Dissatisfied couples talk relatively more often about separation and divorce
Affect Dynamics • Analyze the short-term dynamics of affect usage: • Do unhappy couples follow up negative language with negative language more often than happy couples? In other words, are unhappy couples involved in a negative feedback loop? • Calculated: • P( z2=+ | z1=+ ) • P( z2=+ | z1=- ) • P( z2=- | z1=+ ) • P( z2=- | z1=- ) • E.g. P( z2=- | z1=+ ) is the probability that after a positive word the next non-neutral word will be a negative word
Markov Chain Illustration Base rates + .27 z Normal Controls - - + .73 .72 .28 + .33 z Positive Change - - + .67 .73 .27 + .37 z Little Change - - + .63 .78 .22 + .41 z Negative Change - - + .59 .78 .22
Extensions • Multi-label Document Classification • conditional topic models • Topic models and word order • ngrams/collocations • hidden-markov models • Some potential model developments: • topic models incorporating parse trees • topic models for dialogue • topic models incorporating speech information
Conditional Topic Models Assume there is a topic associated with each label/behavioral code. Model only is allowed to assign words to labels that are associated with the document This model can learn the distribution of words associated with each label/behavioral code
Vulnerability=yes Hard Expression=no “Vulnerability” word? word word? word? word? word? word? word? word? word? word? word? word? .... ? Vulnerability=no Hard Expression=yes word? word? word? word? word? word? word? word? word? word? word? word? .... “Hard Expression” ? Vulnerability=yes Hard Expression=yes word? word? word? word? word? word?.... Topics associated with Behavioral Codes Topic Weights Documents and topic assignments
Hidden Markov Topics Model • Syntactic dependencies short range dependencies • Semantic dependencies long-range q Semantic state: generate words from topic model z1 z2 z3 z4 w1 w2 w3 w4 Syntactic states: generate words from HMM s1 s2 s3 s4 (Griffiths, Steyvers, Blei, & Tenenbaum, 2004)
NIPS Semantics KERNEL SUPPORT VECTOR SVM KERNELS # SPACE FUNCTION MACHINES SET NETWORK NEURAL NETWORKS OUPUT INPUT TRAINING INPUTS WEIGHTS # OUTPUTS IMAGE IMAGES OBJECT OBJECTS FEATURE RECOGNITION VIEWS # PIXEL VISUAL EXPERTS EXPERT GATING HME ARCHITECTURE MIXTURE LEARNING MIXTURES FUNCTION GATE MEMBRANE SYNAPTIC CELL * CURRENT DENDRITIC POTENTIAL NEURON CONDUCTANCE CHANNELS DATA GAUSSIAN MIXTURE LIKELIHOOD POSTERIOR PRIOR DISTRIBUTION EM BAYESIAN PARAMETERS STATE POLICY VALUE FUNCTION ACTION REINFORCEMENT LEARNING CLASSES OPTIMAL * NIPSSyntax IN WITH FOR ON FROM AT USING INTO OVER WITHIN # * I X T N - C F P IS WAS HAS BECOMES DENOTES BEING REMAINS REPRESENTS EXISTS SEEMS SEE SHOW NOTE CONSIDER ASSUME PRESENT NEED PROPOSE DESCRIBE SUGGEST HOWEVER ALSO THEN THUS THEREFORE FIRST HERE NOW HENCE FINALLY MODEL ALGORITHM SYSTEM CASE PROBLEM NETWORK METHOD APPROACH PAPER PROCESS USED TRAINED OBTAINED DESCRIBED GIVEN FOUND PRESENTED DEFINED GENERATED SHOWN
Random sentence generation LANGUAGE: [S] RESEARCHERS GIVE THE SPEECH [S] THE SOUND FEEL NO LISTENERS [S] WHICH WAS TO BE MEANING [S] HER VOCABULARIES STOPPED WORDS [S] HE EXPRESSLY WANTED THAT BETTER VOWEL
Collocation Topic Model Terrorism Wall Street Firms Stock Market Bankruptcy SEPT_11 WAR SECURITY IRAQ TERRORISM NATION KILLED AFGHANISTAN ATTACKS OSAMA_BIN_LADEN AMERICAN ATTACK NEW_YORK_REGION NEW MILITARY NEW_YORK WORLD NATIONAL QAEDA TERRORIST_ATTACKS WALL_STREET ANALYSTS INVESTORS FIRM GOLDMAN_SACHS FIRMS INVESTMENT MERRILL_LYNCH COMPANIES SECURITIES RESEARCH STOCK BUSINESS ANALYST WALL_STREET_FIRMS SALOMON_SMITH_BARNEY CLIENTS INVESTMENT_BANKING INVESTMENT_BANKERS INVESTMENT_BANKS WEEK DOW_JONES POINTS 10_YR_TREASURY_YIELD PERCENT CLOSE NASDAQ_COMPOSITE STANDARD_POOR CHANGE FRIDAY DOW_INDUSTRIALS GRAPH_TRACKS EXPECTED BILLION NASDAQ_COMPOSITE_INDEX EST_02 PHOTO_YESTERDAY YEN 10 500_STOCK_INDEX BANKRUPTCY CREDITORS BANKRUPTCY_PROTECTION ASSETS COMPANY FILED BANKRUPTCY_FILING ENRON BANKRUPTCY_COURT KMART CHAPTER_11 FILING COOPER BILLIONS COMPANIES BANKRUPTCY_PROCEEDINGS DEBTS RESTRUCTURING CASE GROUP
Using parse trees/ pos taggers? S S NP NP VP VP “You complete me” “I complete you”
Topic Segmentation Model • Purver, Kording, Griffiths, & Tenenbaum, J. B. (2006). Unsupervised topic modeling for multi-party spoken discourse. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics • Automatically segments multi-party discourse into topically coherent segments • Outperforms standard HMMs • Model does not incorporate speaker information or speaker turns • goal is simply to segment long stream of words into segments
At each utterance, there is a prob. of changing theta, the topic mixture. If no change is indicated, words are drawn from the same mixture of topics. If there is a change, the topic mixture is resampled from Dirichley
Latent Dialogue Structure modelDing et al. (Nips workshop, 2009) • Designed for modeling sequences of messages on discussion forums • Models the relationship of messages within documents – a message might relate to any previous message within a dialogue • It does not incorporate speaker specific variables
Learning User Intentions in Spoken Dialogue SystemsChinaei et al. (ICAART, 2009) • Applies HTMM model (Gruber et al., 2007) to dialogue • Assumes that within each talk-turn, words are drawn from same topic z (not mixture!). At start of new talk-turn, there is some probability (psi below) of sampling new topic z from mixture theta
Other ideas • Can we enhance topic models with non-verbal speech information • Each topic is a distribution over words as well as voicing information (f0, timing, etc) T Nd D Non-verbal feature