Learning User Preferences

Learning User Preferences Jason Rennie MIT CSAIL jrennie@gmail.com Advisor: Tommi Jaakkola

Information Extraction • Informal Communication: e-mail, mailing lists, bulletin boards • Issues: • Context switching • Abbreviations & shortened forms • Variable punctuation, formatting, grammar

Thesis Advertisement: Outline • Thesis is not end-to-end IE system • We address some IE problems: • Identifying & Resolving Named Entites • Tracking Context • Learning User Preferences

Identifying Named Entities • “Rialto is now open until 11pm” • Facts/Opinions usually about a named entity • Tools typically rely on punctuation, capitalization, formatting, grammar • We developed criterion to identify topic-oriented words using occurrence stats [Rennie & Jaakkola, SIGIR 2005]

Resolving Named Entites • “They’re now open until 11pm” • What does “they” refer to? • Clustering • Group noun phrases that co-refer • McCallum & Wellner (2005) • Excellent for proper nouns • Our contribution: better modeling of non-proper nouns (incl. pronouns)

Tracking Context • “The Swordfish was fabulous” • Indirect comment on restaurant. • Restaurant identifed by context. • Use word statistics to find topic switches • Contribution: new sentence clustering algorithm

Learning User Preferences • Examples: • “I loved Rialto last night.” • “Overall, Oleana was worth the money” • “Radius wasn’t bad, but wasn’t great” • “Om was purely pretentious” • Issues: • Translate text to partial ordering or rating • Predict unobserved ratings

Preference Problems • Single User w/ Item Features • Multi-user, no features • Aka Collaborative Filtering

5 Capacity 10 Tables #9 Park Lumiere Tanjore Chennai Rndzvous Feature Values Price 4=6 French? 4 2 2 30 +8 0 30 1 0 90 0 1 4 0 -4 60 3 50 1 0 0 60 3 +1 1 -7 2 80 0 30 1 1 0 20 0 -6 1 40 0 0 0 -3 2 0 2 40 0 80 1 New American? 3=3 Ethnic? 3 Formality Location 2=-2 2 1=-5 1 -0.1 -0.1 +10 +5 0 0 +2 Preference Scores User Weights Single User, Item Features Ratings

Capacity 10 Tables #9 Park Lumiere Tanjore Chennai Rndzvous Feature Values Price French? 30 0 0 5 2 ? 2 30 1 2 90 4 0 60 ? 0 1 3 60 1 3 3 ? 1 0 50 0 1 30 ? 1 2 80 0 0 1 ? 40 ? 0 20 0 0 0 1 ? 40 0 ? 1 2 2 80 0 New American? Ethnic? Formality Location ? ? ? ? ? ? ? Ratings User Weights Single User, Item Features  Preference Scores

? -2.7 -2.5 1 2.1 1.4 0.2 -1.8 2 2 5 4 3 4 1.9 0.2 -2.7 -0.9 -2.5 2 -2.2 -4.2 1 2 2 3 1.4 5 3 -1.4 4.7 -4.2 5.6 0.2 2 5 1 5 -0.9 3 1.4 2 3 3 5 2 2.1 -4.2 5.6 4 3 -0.9 1 5 0.7 3.1 2.6 -3.5 1 3 2.1 2 -1.8 2 3 1 3.4 0.2 3.1 2 5.6 -4.2 3 4 -0.8 -2.5 -1.8 3.1 -2.1 5 4 -2.7 5 2 3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  ? ? ? ? ? ? Many Users, No Features Features Weights Ratings Preference Scores

2 1 4 3 4 5 2 2 5 3 2 2 3 1 3 5 2 3 2 1 5 2 3 1 3 5 4 5 1 3 2 1 2 2 3 5 3 5 3 4 4 2 Collaborative Filtering • Possible goals: • Predict missing entries • Cluster users or items • Applications: • Movies, Books • Genetic Interaction • Network routing • Sports performance items users

Outline • Single User, Features • Loss functions, Convexity, Large Margin • Loss function for Ratings • Many Users, No Features • Feature Selection, Rank, SVD • Regularization: tie together multiple tasks • Optimization: scale to large problems • Extensions

This Talk: Contributions • Implementation and systematic evaluation of loss functions for Single User prediction. • Scaling Multi-user regularization to large (thousands of users/items) problems • Analysis of optimization • Extensions • Hybrid: features + multiple users • Observation model & multiple ratings

q1 q2 Rating Classification • n ordered classes • Learn weight vector, thresholds 1 3 2 3 1 2 1 2 1 1 2 3 2 2 1 3 3 3 w

Loss Functions 0-1 Hinge Logistic Smooth Hinge Mod. Least Squares Margin Agreement

Convexity • Convex function => no local minima • Set convex if all line segments within set

Convexity of Loss Functions • 0-1 loss is not convex • Local minima, sensitive to small changes • Convex Bound • Large margin solution with regularization • Stronger guarantees

q1 q2 1 3 2 3 1 2 1 2 1 1 2 3 2 2 1 3 3 3 w Proportional Odds • McCullagh introduced original rating model • Linear interaction: weights & features • Thresholds • Maximum likelihood [McCullagh, 1980]

4 5 1 2 3 Immediate-Thresholds [Shashua & Levin, 2003]

User: System 1: System 2: Some Errors are Better than Others

Not a Bound on Absolute Diff. 4 3 2 5 1

4 5 1 2 3 All-Thresholds Loss [Srebro, Rennie & Jaakkola, NIPS 2004]

Experiments Least Squares: 1.3368 [Rennie & Srebro, IJCAI 2005]

? 2 2 1.9 1 4 3 5 -2.5 -2.7 2.1 -1.8 1.4 0.2 4 2 2 -0.9 2 3 1 -2.2 -2.7 5 -2.5 0.2 -4.2 1.4 3 2 3 0.2 3 5 1.4 4.7 2 5.6 -1.4 1 -0.9 5 -4.2 2.1 0.7 4 -0.9 3 5 5 3 -4.2 5.6 2 2.6 3.1 1 3.4 -1.8 3 3 1 2 1 -3.5 2 2 5.6 3.1 2.1 0.2 4 5 -1.8 4 3.1 5 -0.8 -2.5 -2.7 -4.2 3 -2.1 2 3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  ? ? ? ? ? ? Many Users, No Features Features Weights Ratings Preference Scores

Background: Lp-norms • L0: # non-zero entries: ||<0,2,0,3,4>||0 = 3 • L1: absolute value sum: ||<2,-2,1>||1 = 5 • L2: Euclidean length: ||<1,-1>||2 = 2 • General: ||v||p = (i |vi|p)1/p

Background: Feature Selection • Objective: Loss + Regularization L1 L2 Squared

Singular Value Decomposition • X=USV’ • U,V: orthogonal (rotation) • S: diagonal, non-negative • Eigenvalues of XX’=USV’VSU’=USSU’ are squared singular values of X • Rank = ||s||0 • SVD: used to obtain least-squares low-rank approximation

2 4 5 1 4 2 Y 3 1 2 2 5 4 4 2 4 1 3 1 3 3 4 2 4 2 3 1 4 3 2 2 2 1 4 5 2 4 1 4 2 3 1 3 1 1 4 3 4 2 2 5 3 1 Low Rank Matrix Factorization V’ U X rank k × ¼ = • Sum-Squared Loss • Fully Observed Y • Classification Error Loss • Partially Observed Y Use SVD to find Global Optimum Non-convex No explicit soln.

Rank 2 Rank 1 Rank 1 Low-Rank: Non-Convex Set

Trace Norm: sum of singular values y Trace Norm Regularization [Fazel et al., 2001]

4 -1.8 2.1 0.2 2 3 1 2 5 -2.5 1.9 4 -2.7 1.4 -2.7 -2.2 -0.9 5 3 -4.2 2 1 0.2 3 2 -2.5 2 1.4 1.4 0.2 2 5.6 5 4.7 1 3 -4.2 -0.9 2 3 -1.4 5 3 5.6 4 3 2.6 1 5 3.1 5 -4.2 2 -0.9 2.1 0.7 -1.8 0.2 2 2 1 2 1 3 -3.5 3.1 3 2.1 5.6 3.4 4 -2.5 -2.1 3 4 3 5 -2.7 -1.8 -4.2 3.1 5 -0.8 2  Many Users, No Features Features V’ X U Y Weights Ratings Preference Scores

Max Margin Matrix Factorization • Convex function of X and  • Low rank in X Trace Norm All-Thresholds Loss [Srebro, Rennie & Jaakkola, NIPS 2004]

Properties of the Trace Norm The factorization: US, VS minimizes both quantities

Factorized Optimization • Factorized Objective (tight bound): • Gradient descent: O(n3) per round • Stationary points, but no local minima [Rennie & Srebro, ICML 2005]

Collaborative Prediction Results [URP & Attitude: Marlin, 2004] [MMMF: Rennie & Srebro, 2005]

Extensions • Multi-user + Features • Observation model • Predict which restaurants a user will rate, and • The rating she will make • Multiple ratings per user/restaurant • E.g. Food, Service and Décor ratings • SVD Parameterization

Fixed Features Learned Features Multi-User + Features • Feature parameters (V): • Some are fixed • Some are learned • Learn weights (U) for all features • Fixed part of V does not affect regularization V’

Observation Model • Common assumption: ratings observed at random • Restaurant selection: • Geography, popularity, price, food style • Remove bias: model observation process

Observation Model • Model as binary classification • Add binary classification loss • Tie together rating and observation models  X=UXV’ W=UWV’

Multiple Ratings • Users may provide multiple ratings: • Service, Décor, Food • Add in loss functions • Stack parameter matrices for regularization

SVD Parameterization • Too many parameters: UAA-1V’=X is another factorization of X • Alternate: U,S,V • U,V orthogonal, S diagonal • Advantages: • Not over-parameterized • Exact objective (not a bound) • No stationary points

Summary • Loss function for ratings • Regularization for multiple users • Scaled MMMF to large problems (e.g. > 1000x1000) • Trace norm: widely applicable • Extensions Code: http://people.csail.mit.edu/jrennie/matlab

Thanks! • Helen, for supporting me for 7.5 years! • Tommi Jaakkola, for answering all my questions and directing me to the “end”! • Mike Collins and Tommy Poggio for add’l guidance. • Nati Srebro & John Barnett for endless valuable discussions and ideas. • Amir Globerson, David Sontag, Luis Ortiz, Luis Perez-Breva, Alan Qi, & Patrycja Missiuro & all past members of Tommi’s reading group for paper discussions, conference trips and feedback on my talks. • Many, many others who have helped me along the way!

Objective Minimum Low-Rank Minimum Low-Rank Local Minimum Low- Rank Low- Rank Low-Rank Optimization

Learning User Preferences

Learning User Preferences

Presentation Transcript

Preferences

Preferences

Preferences

Matchin Eliciting User Preferences with an online game

Learning to Question: Leveraging User Preferences for Shopping Advice

Learning styles Learning Preferences Learning Strategies

Inferring User Political Preferences from Streaming Communications

Learning Pastoralists Preferences via Inverse Reinforcement Learning (IRL)

Learning Pastoralists Preferences via Inverse Reinforcement Learning (IRL)

Learning Preferences

Learning to Question: Leveraging User Preferences for Shopping Advice

Matchin: Eliciting User Preferences with an Online Game

Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

LEARNING USER PLAN PREFERENCES OBFUSCATED BY FEASIBILITY CONSTRAINTS

iSite 3.5: User Preferences

Approaches to Modeling and Learning User Preferences

Preferences

Learning Style Preferences

Learning User Interaction Models for Predicting Web Search Result Preferences

Learning Preferences/ Multiple Intelligences

Learning user preferences for 2CP-regression for a recommender system

User preferences for coworking space characteristics