260 likes | 291 Views
Transfer Learning for Enhancing Information Flow in Organizations and Social Networks. Chris Pal Xuerui Wang & Andrew McCallum University of Massachusetts, Amherst. Summary. New Topic Models, Start Simple & Build - Compare with related model structures - Precision vs. Recall 20 Newsgroups
E N D
Transfer Learning for Enhancing Information Flow in Organizations and Social Networks Chris Pal Xuerui Wang & Andrew McCallum University of Massachusetts, Amherst
Summary • New Topic Models, Start Simple & Build- Compare with related model structures- Precision vs. Recall 20 Newsgroups • Add Authors + Discriminative Methods- Predict NIPS Authors & Email Recipients • Authors + Recipients & Creating (DARTs)- Transfer Learning in Social Networks- Experiments with Enron Email
New Continuous Topic Models • Undirected (Random Field) Joint Model • Conditionally log-Normal Topics • Conditionally Multinomial Words Contrast w/ LDA Plate Notation Nt topics Nw words
Further Contrast - MCA, PCA, RAP • Multinomial Component Analysis (MCA) • Principal Component Analysis (PCA) • Rate Adapting Poisson (RAP) Model MCA PCA RAP Nz unobserved, Gaussian variables Nb binary topics Nv Poisson counts for each word in vocabulary Nx observed, Gaussian variables, fixed dimension Nw draws from a discrete distribution, (words in doc)
Our Model (MCA) vs. TFIDF vs. RAP MRR Method .45Our Model.37TFIDF.33RAP • Precision vs. Recall on 20 Newsgroups, 100 word vocabulary • 20 dimensional hidden topic space • Cosine Distance Comparisons (.9, .1 – Train, Test Split) • Compared with TFIDF and Rate Adapting Poisson (RAP) Model
20 Newsgroups • 10,000 word vocab. - highest MI with class • 18,796 documents • Downcased, no stopwords, porter stemmed • comp., rec., sci., .forsale, .politics, .religion NIPS • 13,649 word vocab. • 1,740 papers • Downcased, no stopwords, no stemming • 13 years of NIPS proceedings 1987-1999
Discriminative Training, MCL and a Richer Model Maximum Likelihood Discriminative Training Nt topics Nw words ‘Multi-conditional’ Training A Richer Model Discriminative Training Nw authors, year, Nw words
The Main Equations • The conditionals for Gibbs sampling • Optimize the marginal or marg. conditionals • Optimize the marginal or marg. conditionals
NIPS TopicsMulti-conditional Learning Optimize an objective based on the product of the conditional probability for one word given all the others.
Predicting NIPS Authors • Comparing Models, Mean Reciprocal Rank (MRR) • Cosine Distance Comparisons (.9, .1 – Train, Test Split) MRR Method .88 Discriminative .46Joint.25Joint, Words only
Academic Email • 4,643 emails • 190 recipients • 8,693 word vocabulary • Downcased, no stopwords, no stemming Mean Reciprocal Rank (MRR) Evaluation Reciprocal of the rank at which the first relevant response was returnedMethod 1: Use the cosine of all previous sent email, obtain authors from ordered closest matchMethod 2: Use model to make predictions obtain ordered list from probability distribution
Predicting Email Recipients • Comparing Models, Mean Reciprocal Rank (MRR) • Cosine Distance Comparisons (.9, .1 – Train, Test Split) • 20 dimensional hidden ‘topic’ space MRR Method .60 Discriminative .30Joint.21Joint, Words only
Summary of Results so Far • Richer model with authors included helps • Discriminative optimization helps a lot
Undirected, Continuous Author Recipient Topic Models Nt topics • A continuous topic model • Author recipient topic model • Plated version of same model Nw words Author, Nr Recipients, Nw words Plate Notation
Enron Email • 150 employees • 250,000 emails • Avg. of 1400 sent emails [200 – 4800] • Experiments with .9, .1 test-train split • Use model to make prediction & cosine method • Explore two types of transfer learning: 1. Shared hidden variables2. Group and local models & coupled parameters
1. Transfer Using Shared Topics MRR Method .68 Transfer DART.62TFIDF • Use model with shared latent space for predictions
Discriminative Author Recipient Topic (DART) Model Directed, ART Model (Discrete) Undirected, Continuous Topic DART Model
Transfer Learning with DARTs … 2. Adapt DART to a given users email 1. Train DART on orgs entire email 3. Major advantage for new users
2. Transfer Parameters & Adapt • Topics with Transfer vs.No Transfer
Transfer Parameters & Adapt • 200 Topic Models, Transfer vs. No Transfer
Summary, Conclusions Discussion • New, rich topic models for text & attributes • Discriminative methods - dramatic increase in task performance • Two types of transfer learning- Each leverage social / org. networks • Dramatic benefit for a new model/userQuestion: Can similar users be identified for more sophisticated transfer? • Practical Issues: Information Sharing etc.