390 likes | 587 Views
Multiple Domain User Personalization. Deepak Agarwal Yahoo! Research. Yucheng Low Carnegie Mellon University. Alexander J. Smola Yahoo! Research. Information Flood. Personalization. Golf Reader. Tech. Reader. Can we provide personalization to new users?. One Domain Cold-Start.
E N D
Multiple Domain User Personalization Deepak Agarwal Yahoo! Research Yucheng Low Carnegie Mellon University Alexander J. Smola Yahoo! Research
Personalization Golf Reader Tech. Reader Can we provide personalization to new users?
One Domain Cold-Start Movies User 1 User 2 Impossible when you have only one domain. Best you can do is to have a good baseline.
Multiple Domains Cold Start Music Movies News Possible when you have many domains.
Personalization across all domain Combine tokens from all spaces ignoring the source domain Expand token space to include source domain User Reads Golf News Watches MTV Golf, Tiger, Music, Song Golf:1, Tiger:1, Music:2, Song:2 Your Favorite Personalization Algorithm
Personalization across all domain Combine tokens from all spaces ignoring the source domain Expand token space to include source domain User • Domains with more observations willswamp out all other domains Reads Golf News Watches MTV Golf, Tiger, Music, Song Golf:1, Tiger:1, Music:2, Song:2 • What is a good personalization algorithm that will work for all domains? Your Favorite Personalization Algorithm
Solution Meta-Profile • Isolates each domain: Prevents larger domains from swamping out smaller domains. User Meta Profile User Music Profile User News Profile Personalized Music Personalized News
Solution Meta-Profile • Extensible: domains can be added/removed easily User Meta Profile User Music Profile User News Profile User Movie Profile
Latent Dirichlet Allocation Topic 2 Topic 1 Topic 3 Basketball NBA, hoop Train 3-point Machine, Learning, Neural, Network, Train Golf, Tiger, Woods, Club, Green, Hole-in-one Document Michael I. Jordan trains a Neural Network to play golf Topic 1 Topic 2 3 2 Topic 3 Network Golf
Latent Dirichlet Allocation A document is a bag of words. A topic is a mixture of words. • Each document has a mixture over topics • For each word in each • document • Draw a topic • Draw a word • from the topic N Document
Latent Dirichlet Allocation A document is a bag of words. A topic is a mixture of words. • Each document has a mixture over topics • For each word in each • document • Draw a topic • Draw a word • from the topic N Document Document
Latent Dirichlet Allocation A document is a bag of words. A topic is a mixture of words. • Each document has a mixture over topics • For each word in each document • Draw a topic • Draw a word • from the topic Sample From: N Document Document
Latent Dirichlet Allocation A document is a bag of words. A topic is a mixture of words. • Each document has a mixture over topics • For each word in each document • Draw a topic • Draw a word • from the topic N Document Topic 1:Basketball, Michael, Jordan Topic 2:Golf, Tiger, Woods, Club, Green Topic 3: Machine, Learning, Neural
Latent Dirichlet Allocation A document is a bag of words. A topic is a mixture of words. Words which make up each topic • Each document has a mixture over topics • For each word in each document • Draw a topic • Draw a word • from the topic N Document Topics which make up each document
Single Domain Personalization A user’s interaction with a domain is a bag of words. A topic is a mixture of words. Words which make up each topic • Each user has a mixture • over topics • For each word in each document • Draw a topic • Draw a word • from the topic N User Topics each user is interested in
Multiple Domain Personalization A user’s interaction with a domain is a bag of words. A topic is a mixture of words. User’s prior interest in a domain is N User u’s interaction with domain d User Each user has a meta-profile: Each domain has a latent matrix:
Solution Meta-Profile User Meta Profile User Music Profile User Movie Profile User News Profile
Users Music Topic->word table News Topic->word table Movies Topic->word table
Gibbs Sampling N User u’s interaction with domain p LDA
Gibbs Sampling 1: Sample N User u’s interaction with domain p Sample using LDA Sampler Hold Constant Hold Constant
Gibbs Sampling 1: Sample 2: Sample N User u’s interaction with domain p Hold Constant Langevin Diffusion Hold Constant Sample
Gibbs Sampling 1: Sample 2: Sample 3: Optimize N User u’s interaction with domain p LBFGS Hold Constant Optimize Hold Constant
Experiments @ Yahoo! • 2 domain dataset. Frontpage and News clicks of 5.6 million users. Frontpage/News: Article text for each click. • 3 domain dataset: Frontpage, News and MyYahooclicks of 5.6 million users. MyYahoo: Only has article IDs for each click with no text. Not semantically meaningful. All user information was anonymized.
Test Protocol Holdout proportionof users who see more than one domain. Hide one of those domain and try to predict the words. Prediction metric is cosine similarity Baseline is “mean prediction”.
Implementation • Distributed implementation in C++ using Memcached for communication. • Alex Smola, ShravanNarayanamurthy “An Architecture for Parallel Topic Models” VLDB 2010 • Distributed LBFGS line search: • Implement standard MPI-like in Memcached. • Broadcast • Reduce • Barrier • Takes 2-3 days for 500 iterations on 30 machines
Frontpage -> News Science Celebrity bacteria, fight, super, struggling, developed, doctors, resistant, lethal, virtually, drugs, antibiotic, competitors, chad, film, movie, movies, films, director, story, avatar, james, time, hollywood, big, make, hes, star, sandra, oscar, oscars, red, carpet, bullock, golden, gown, bullocks, nominee, bestactress, sparkles, stunning, vienna, bachelor, jake, pavelka, giraldi, finale, show, stars, dancing, love, season, time, abc, Entertainment Science Fiction
News -> Frontpage Politics Devices health, care, bill, obama, president, rep, house, republican, senate, news, sen, democrats, fox, congress, reform drafts, player, nfl, scouts, team, riskiest, peril, bryant, dez, pick, talented, nba, james, news, iphone, apple, app, apps, ipod, google, store, apples, android, mac, mobile, touch, ipad, device, phone, college, year, earn, years, 000, bestpaid, average, 129, colleges, graduates, ten, alums, schools, actor, likes, home, bank, facing, ceo, gomez, eviction, rosalina, bought, cleaning, foreclosed, client, janitor, offices, surprising, video,, captured, inside, mountain, terrorist, observers, impresses, alqaidas, complexity, base, features, hideout, size, special, secret, struck,, College
Extension User Meta Profile User Music Profile User News Profile User Movie Profile Latent Dirichlet Allocation Latent Dirichlet Allocation Latent Dirichlet Allocation
Extension • Flexible: Allows different algorithm for each domain User Meta Profile User Music Profile User News Profile User Movie Profile fLDA Matrix Factorization Linear Model
It Is How You Use It Use the Meta Profile for Initialization. User Meta Profile User Music Profile Personalized with Algorithm X
It Is How You Use It Periodically Update the Meta Profile and Domain Latent Matrix User Meta Profile User Music Profile Personalized with Algorithm X
Conclusion • An generic, extensible model for combining domain personalization schemes. • Scalable inference procedure that extends to millions of users. • Demonstrate strong predictive performance on a large real world data