450 likes | 599 Views
Learning User Preferences. Jason Rennie MIT CSAIL jrennie@gmail.com. Advisor: Tommi Jaakkola. Information Extraction. Informal Communication: e-mail, mailing lists, bulletin boards Issues: Context switching Abbreviations & shortened forms Variable punctuation, formatting, grammar.
E N D
Learning User Preferences Jason Rennie MIT CSAIL jrennie@gmail.com Advisor: Tommi Jaakkola
Information Extraction • Informal Communication: e-mail, mailing lists, bulletin boards • Issues: • Context switching • Abbreviations & shortened forms • Variable punctuation, formatting, grammar
Thesis Advertisement: Outline • Thesis is not end-to-end IE system • We address some IE problems: • Identifying & Resolving Named Entites • Tracking Context • Learning User Preferences
Identifying Named Entities • “Rialto is now open until 11pm” • Facts/Opinions usually about a named entity • Tools typically rely on punctuation, capitalization, formatting, grammar • We developed criterion to identify topic-oriented words using occurrence stats [Rennie & Jaakkola, SIGIR 2005]
Resolving Named Entites • “They’re now open until 11pm” • What does “they” refer to? • Clustering • Group noun phrases that co-refer • McCallum & Wellner (2005) • Excellent for proper nouns • Our contribution: better modeling of non-proper nouns (incl. pronouns)
Tracking Context • “The Swordfish was fabulous” • Indirect comment on restaurant. • Restaurant identifed by context. • Use word statistics to find topic switches • Contribution: new sentence clustering algorithm
Learning User Preferences • Examples: • “I loved Rialto last night.” • “Overall, Oleana was worth the money” • “Radius wasn’t bad, but wasn’t great” • “Om was purely pretentious” • Issues: • Translate text to partial ordering or rating • Predict unobserved ratings
Preference Problems • Single User w/ Item Features • Multi-user, no features • Aka Collaborative Filtering
5 Capacity 10 Tables #9 Park Lumiere Tanjore Chennai Rndzvous Feature Values Price 4=6 French? 4 2 2 30 +8 0 30 1 0 90 0 1 4 0 -4 60 3 50 1 0 0 60 3 +1 1 -7 2 80 0 30 1 1 0 20 0 -6 1 40 0 0 0 -3 2 0 2 40 0 80 1 New American? 3=3 Ethnic? 3 Formality Location 2=-2 2 1=-5 1 -0.1 -0.1 +10 +5 0 0 +2 Preference Scores User Weights Single User, Item Features Ratings
Capacity 10 Tables #9 Park Lumiere Tanjore Chennai Rndzvous Feature Values Price French? 30 0 0 5 2 ? 2 30 1 2 90 4 0 60 ? 0 1 3 60 1 3 3 ? 1 0 50 0 1 30 ? 1 2 80 0 0 1 ? 40 ? 0 20 0 0 0 1 ? 40 0 ? 1 2 2 80 0 New American? Ethnic? Formality Location ? ? ? ? ? ? ? Ratings User Weights Single User, Item Features Preference Scores
? -2.7 -2.5 1 2.1 1.4 0.2 -1.8 2 2 5 4 3 4 1.9 0.2 -2.7 -0.9 -2.5 2 -2.2 -4.2 1 2 2 3 1.4 5 3 -1.4 4.7 -4.2 5.6 0.2 2 5 1 5 -0.9 3 1.4 2 3 3 5 2 2.1 -4.2 5.6 4 3 -0.9 1 5 0.7 3.1 2.6 -3.5 1 3 2.1 2 -1.8 2 3 1 3.4 0.2 3.1 2 5.6 -4.2 3 4 -0.8 -2.5 -1.8 3.1 -2.1 5 4 -2.7 5 2 3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Many Users, No Features Features Weights Ratings Preference Scores
2 1 4 3 4 5 2 2 5 3 2 2 3 1 3 5 2 3 2 1 5 2 3 1 3 5 4 5 1 3 2 1 2 2 3 5 3 5 3 4 4 2 Collaborative Filtering • Possible goals: • Predict missing entries • Cluster users or items • Applications: • Movies, Books • Genetic Interaction • Network routing • Sports performance items users
Outline • Single User, Features • Loss functions, Convexity, Large Margin • Loss function for Ratings • Many Users, No Features • Feature Selection, Rank, SVD • Regularization: tie together multiple tasks • Optimization: scale to large problems • Extensions
This Talk: Contributions • Implementation and systematic evaluation of loss functions for Single User prediction. • Scaling Multi-user regularization to large (thousands of users/items) problems • Analysis of optimization • Extensions • Hybrid: features + multiple users • Observation model & multiple ratings
q1 q2 Rating Classification • n ordered classes • Learn weight vector, thresholds 1 3 2 3 1 2 1 2 1 1 2 3 2 2 1 3 3 3 w
Loss Functions 0-1 Hinge Logistic Smooth Hinge Mod. Least Squares Margin Agreement
Convexity • Convex function => no local minima • Set convex if all line segments within set
Convexity of Loss Functions • 0-1 loss is not convex • Local minima, sensitive to small changes • Convex Bound • Large margin solution with regularization • Stronger guarantees
q1 q2 1 3 2 3 1 2 1 2 1 1 2 3 2 2 1 3 3 3 w Proportional Odds • McCullagh introduced original rating model • Linear interaction: weights & features • Thresholds • Maximum likelihood [McCullagh, 1980]
4 5 1 2 3 Immediate-Thresholds [Shashua & Levin, 2003]
User: System 1: System 2: Some Errors are Better than Others
Not a Bound on Absolute Diff. 4 3 2 5 1
4 5 1 2 3 All-Thresholds Loss [Srebro, Rennie & Jaakkola, NIPS 2004]
Experiments Least Squares: 1.3368 [Rennie & Srebro, IJCAI 2005]
? 2 2 1.9 1 4 3 5 -2.5 -2.7 2.1 -1.8 1.4 0.2 4 2 2 -0.9 2 3 1 -2.2 -2.7 5 -2.5 0.2 -4.2 1.4 3 2 3 0.2 3 5 1.4 4.7 2 5.6 -1.4 1 -0.9 5 -4.2 2.1 0.7 4 -0.9 3 5 5 3 -4.2 5.6 2 2.6 3.1 1 3.4 -1.8 3 3 1 2 1 -3.5 2 2 5.6 3.1 2.1 0.2 4 5 -1.8 4 3.1 5 -0.8 -2.5 -2.7 -4.2 3 -2.1 2 3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Many Users, No Features Features Weights Ratings Preference Scores
Background: Lp-norms • L0: # non-zero entries: ||<0,2,0,3,4>||0 = 3 • L1: absolute value sum: ||<2,-2,1>||1 = 5 • L2: Euclidean length: ||<1,-1>||2 = 2 • General: ||v||p = (i |vi|p)1/p
Background: Feature Selection • Objective: Loss + Regularization L1 L2 Squared
Singular Value Decomposition • X=USV’ • U,V: orthogonal (rotation) • S: diagonal, non-negative • Eigenvalues of XX’=USV’VSU’=USSU’ are squared singular values of X • Rank = ||s||0 • SVD: used to obtain least-squares low-rank approximation
2 4 5 1 4 2 Y 3 1 2 2 5 4 4 2 4 1 3 1 3 3 4 2 4 2 3 1 4 3 2 2 2 1 4 5 2 4 1 4 2 3 1 3 1 1 4 3 4 2 2 5 3 1 Low Rank Matrix Factorization V’ U X rank k × ¼ = • Sum-Squared Loss • Fully Observed Y • Classification Error Loss • Partially Observed Y Use SVD to find Global Optimum Non-convex No explicit soln.
Rank 2 Rank 1 Rank 1 Low-Rank: Non-Convex Set
Trace Norm: sum of singular values y Trace Norm Regularization [Fazel et al., 2001]
4 -1.8 2.1 0.2 2 3 1 2 5 -2.5 1.9 4 -2.7 1.4 -2.7 -2.2 -0.9 5 3 -4.2 2 1 0.2 3 2 -2.5 2 1.4 1.4 0.2 2 5.6 5 4.7 1 3 -4.2 -0.9 2 3 -1.4 5 3 5.6 4 3 2.6 1 5 3.1 5 -4.2 2 -0.9 2.1 0.7 -1.8 0.2 2 2 1 2 1 3 -3.5 3.1 3 2.1 5.6 3.4 4 -2.5 -2.1 3 4 3 5 -2.7 -1.8 -4.2 3.1 5 -0.8 2 Many Users, No Features Features V’ X U Y Weights Ratings Preference Scores
Max Margin Matrix Factorization • Convex function of X and • Low rank in X Trace Norm All-Thresholds Loss [Srebro, Rennie & Jaakkola, NIPS 2004]
Properties of the Trace Norm The factorization: US, VS minimizes both quantities
Factorized Optimization • Factorized Objective (tight bound): • Gradient descent: O(n3) per round • Stationary points, but no local minima [Rennie & Srebro, ICML 2005]
Collaborative Prediction Results [URP & Attitude: Marlin, 2004] [MMMF: Rennie & Srebro, 2005]
Extensions • Multi-user + Features • Observation model • Predict which restaurants a user will rate, and • The rating she will make • Multiple ratings per user/restaurant • E.g. Food, Service and Décor ratings • SVD Parameterization
Fixed Features Learned Features Multi-User + Features • Feature parameters (V): • Some are fixed • Some are learned • Learn weights (U) for all features • Fixed part of V does not affect regularization V’
Observation Model • Common assumption: ratings observed at random • Restaurant selection: • Geography, popularity, price, food style • Remove bias: model observation process
Observation Model • Model as binary classification • Add binary classification loss • Tie together rating and observation models X=UXV’ W=UWV’
Multiple Ratings • Users may provide multiple ratings: • Service, Décor, Food • Add in loss functions • Stack parameter matrices for regularization
SVD Parameterization • Too many parameters: UAA-1V’=X is another factorization of X • Alternate: U,S,V • U,V orthogonal, S diagonal • Advantages: • Not over-parameterized • Exact objective (not a bound) • No stationary points
Summary • Loss function for ratings • Regularization for multiple users • Scaled MMMF to large problems (e.g. > 1000x1000) • Trace norm: widely applicable • Extensions Code: http://people.csail.mit.edu/jrennie/matlab
Thanks! • Helen, for supporting me for 7.5 years! • Tommi Jaakkola, for answering all my questions and directing me to the “end”! • Mike Collins and Tommy Poggio for add’l guidance. • Nati Srebro & John Barnett for endless valuable discussions and ideas. • Amir Globerson, David Sontag, Luis Ortiz, Luis Perez-Breva, Alan Qi, & Patrycja Missiuro & all past members of Tommi’s reading group for paper discussions, conference trips and feedback on my talks. • Many, many others who have helped me along the way!
Objective Minimum Low-Rank Minimum Low-Rank Local Minimum Low- Rank Low- Rank Low-Rank Optimization