1 / 45

Learning User Preferences

Learning User Preferences. Jason Rennie MIT CSAIL jrennie@gmail.com. Advisor: Tommi Jaakkola. Information Extraction. Informal Communication: e-mail, mailing lists, bulletin boards Issues: Context switching Abbreviations & shortened forms Variable punctuation, formatting, grammar.

keran
Download Presentation

Learning User Preferences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning User Preferences Jason Rennie MIT CSAIL jrennie@gmail.com Advisor: Tommi Jaakkola

  2. Information Extraction • Informal Communication: e-mail, mailing lists, bulletin boards • Issues: • Context switching • Abbreviations & shortened forms • Variable punctuation, formatting, grammar

  3. Thesis Advertisement: Outline • Thesis is not end-to-end IE system • We address some IE problems: • Identifying & Resolving Named Entites • Tracking Context • Learning User Preferences

  4. Identifying Named Entities • “Rialto is now open until 11pm” • Facts/Opinions usually about a named entity • Tools typically rely on punctuation, capitalization, formatting, grammar • We developed criterion to identify topic-oriented words using occurrence stats [Rennie & Jaakkola, SIGIR 2005]

  5. Resolving Named Entites • “They’re now open until 11pm” • What does “they” refer to? • Clustering • Group noun phrases that co-refer • McCallum & Wellner (2005) • Excellent for proper nouns • Our contribution: better modeling of non-proper nouns (incl. pronouns)

  6. Tracking Context • “The Swordfish was fabulous” • Indirect comment on restaurant. • Restaurant identifed by context. • Use word statistics to find topic switches • Contribution: new sentence clustering algorithm

  7. Learning User Preferences • Examples: • “I loved Rialto last night.” • “Overall, Oleana was worth the money” • “Radius wasn’t bad, but wasn’t great” • “Om was purely pretentious” • Issues: • Translate text to partial ordering or rating • Predict unobserved ratings

  8. Preference Problems • Single User w/ Item Features • Multi-user, no features • Aka Collaborative Filtering

  9. 5 Capacity 10 Tables #9 Park Lumiere Tanjore Chennai Rndzvous Feature Values Price 4=6 French? 4 2 2 30 +8 0 30 1 0 90 0 1 4 0 -4 60 3 50 1 0 0 60 3 +1 1 -7 2 80 0 30 1 1 0 20 0 -6 1 40 0 0 0 -3 2 0 2 40 0 80 1 New American? 3=3 Ethnic? 3 Formality Location 2=-2 2 1=-5 1 -0.1 -0.1 +10 +5 0 0 +2 Preference Scores User Weights Single User, Item Features Ratings

  10. Capacity 10 Tables #9 Park Lumiere Tanjore Chennai Rndzvous Feature Values Price French? 30 0 0 5 2 ? 2 30 1 2 90 4 0 60 ? 0 1 3 60 1 3 3 ? 1 0 50 0 1 30 ? 1 2 80 0 0 1 ? 40 ? 0 20 0 0 0 1 ? 40 0 ? 1 2 2 80 0 New American? Ethnic? Formality Location ? ? ? ? ? ? ? Ratings User Weights Single User, Item Features  Preference Scores

  11. ? -2.7 -2.5 1 2.1 1.4 0.2 -1.8 2 2 5 4 3 4 1.9 0.2 -2.7 -0.9 -2.5 2 -2.2 -4.2 1 2 2 3 1.4 5 3 -1.4 4.7 -4.2 5.6 0.2 2 5 1 5 -0.9 3 1.4 2 3 3 5 2 2.1 -4.2 5.6 4 3 -0.9 1 5 0.7 3.1 2.6 -3.5 1 3 2.1 2 -1.8 2 3 1 3.4 0.2 3.1 2 5.6 -4.2 3 4 -0.8 -2.5 -1.8 3.1 -2.1 5 4 -2.7 5 2 3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  ? ? ? ? ? ? Many Users, No Features Features Weights Ratings Preference Scores

  12. 2 1 4 3 4 5 2 2 5 3 2 2 3 1 3 5 2 3 2 1 5 2 3 1 3 5 4 5 1 3 2 1 2 2 3 5 3 5 3 4 4 2 Collaborative Filtering • Possible goals: • Predict missing entries • Cluster users or items • Applications: • Movies, Books • Genetic Interaction • Network routing • Sports performance items users

  13. Outline • Single User, Features • Loss functions, Convexity, Large Margin • Loss function for Ratings • Many Users, No Features • Feature Selection, Rank, SVD • Regularization: tie together multiple tasks • Optimization: scale to large problems • Extensions

  14. This Talk: Contributions • Implementation and systematic evaluation of loss functions for Single User prediction. • Scaling Multi-user regularization to large (thousands of users/items) problems • Analysis of optimization • Extensions • Hybrid: features + multiple users • Observation model & multiple ratings

  15. q1 q2 Rating Classification • n ordered classes • Learn weight vector, thresholds 1 3 2 3 1 2 1 2 1 1 2 3 2 2 1 3 3 3 w

  16. Loss Functions 0-1 Hinge Logistic Smooth Hinge Mod. Least Squares Margin Agreement

  17. Convexity • Convex function => no local minima • Set convex if all line segments within set

  18. Convexity of Loss Functions • 0-1 loss is not convex • Local minima, sensitive to small changes • Convex Bound • Large margin solution with regularization • Stronger guarantees

  19. q1 q2 1 3 2 3 1 2 1 2 1 1 2 3 2 2 1 3 3 3 w Proportional Odds • McCullagh introduced original rating model • Linear interaction: weights & features • Thresholds • Maximum likelihood [McCullagh, 1980]

  20. 4 5 1 2 3 Immediate-Thresholds [Shashua & Levin, 2003]

  21. User: System 1: System 2: Some Errors are Better than Others

  22. Not a Bound on Absolute Diff. 4 3 2 5 1

  23. 4 5 1 2 3 All-Thresholds Loss [Srebro, Rennie & Jaakkola, NIPS 2004]

  24. Experiments Least Squares: 1.3368 [Rennie & Srebro, IJCAI 2005]

  25. ? 2 2 1.9 1 4 3 5 -2.5 -2.7 2.1 -1.8 1.4 0.2 4 2 2 -0.9 2 3 1 -2.2 -2.7 5 -2.5 0.2 -4.2 1.4 3 2 3 0.2 3 5 1.4 4.7 2 5.6 -1.4 1 -0.9 5 -4.2 2.1 0.7 4 -0.9 3 5 5 3 -4.2 5.6 2 2.6 3.1 1 3.4 -1.8 3 3 1 2 1 -3.5 2 2 5.6 3.1 2.1 0.2 4 5 -1.8 4 3.1 5 -0.8 -2.5 -2.7 -4.2 3 -2.1 2 3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?  ? ? ? ? ? ? Many Users, No Features Features Weights Ratings Preference Scores

  26. Background: Lp-norms • L0: # non-zero entries: ||<0,2,0,3,4>||0 = 3 • L1: absolute value sum: ||<2,-2,1>||1 = 5 • L2: Euclidean length: ||<1,-1>||2 = 2 • General: ||v||p = (i |vi|p)1/p

  27. Background: Feature Selection • Objective: Loss + Regularization L1 L2 Squared

  28. Singular Value Decomposition • X=USV’ • U,V: orthogonal (rotation) • S: diagonal, non-negative • Eigenvalues of XX’=USV’VSU’=USSU’ are squared singular values of X • Rank = ||s||0 • SVD: used to obtain least-squares low-rank approximation

  29. 2 4 5 1 4 2 Y 3 1 2 2 5 4 4 2 4 1 3 1 3 3 4 2 4 2 3 1 4 3 2 2 2 1 4 5 2 4 1 4 2 3 1 3 1 1 4 3 4 2 2 5 3 1 Low Rank Matrix Factorization V’ U X rank k × ¼ = • Sum-Squared Loss • Fully Observed Y • Classification Error Loss • Partially Observed Y Use SVD to find Global Optimum Non-convex No explicit soln.

  30. Rank 2 Rank 1 Rank 1 Low-Rank: Non-Convex Set

  31. Trace Norm: sum of singular values y Trace Norm Regularization [Fazel et al., 2001]

  32. 4 -1.8 2.1 0.2 2 3 1 2 5 -2.5 1.9 4 -2.7 1.4 -2.7 -2.2 -0.9 5 3 -4.2 2 1 0.2 3 2 -2.5 2 1.4 1.4 0.2 2 5.6 5 4.7 1 3 -4.2 -0.9 2 3 -1.4 5 3 5.6 4 3 2.6 1 5 3.1 5 -4.2 2 -0.9 2.1 0.7 -1.8 0.2 2 2 1 2 1 3 -3.5 3.1 3 2.1 5.6 3.4 4 -2.5 -2.1 3 4 3 5 -2.7 -1.8 -4.2 3.1 5 -0.8 2  Many Users, No Features Features V’ X U Y Weights Ratings Preference Scores

  33. Max Margin Matrix Factorization • Convex function of X and  • Low rank in X Trace Norm All-Thresholds Loss [Srebro, Rennie & Jaakkola, NIPS 2004]

  34. Properties of the Trace Norm The factorization: US, VS minimizes both quantities

  35. Factorized Optimization • Factorized Objective (tight bound): • Gradient descent: O(n3) per round • Stationary points, but no local minima [Rennie & Srebro, ICML 2005]

  36. Collaborative Prediction Results [URP & Attitude: Marlin, 2004] [MMMF: Rennie & Srebro, 2005]

  37. Extensions • Multi-user + Features • Observation model • Predict which restaurants a user will rate, and • The rating she will make • Multiple ratings per user/restaurant • E.g. Food, Service and Décor ratings • SVD Parameterization

  38. Fixed Features Learned Features Multi-User + Features • Feature parameters (V): • Some are fixed • Some are learned • Learn weights (U) for all features • Fixed part of V does not affect regularization V’

  39. Observation Model • Common assumption: ratings observed at random • Restaurant selection: • Geography, popularity, price, food style • Remove bias: model observation process

  40. Observation Model • Model as binary classification • Add binary classification loss • Tie together rating and observation models  X=UXV’ W=UWV’

  41. Multiple Ratings • Users may provide multiple ratings: • Service, Décor, Food • Add in loss functions • Stack parameter matrices for regularization

  42. SVD Parameterization • Too many parameters: UAA-1V’=X is another factorization of X • Alternate: U,S,V • U,V orthogonal, S diagonal • Advantages: • Not over-parameterized • Exact objective (not a bound) • No stationary points

  43. Summary • Loss function for ratings • Regularization for multiple users • Scaled MMMF to large problems (e.g. > 1000x1000) • Trace norm: widely applicable • Extensions Code: http://people.csail.mit.edu/jrennie/matlab

  44. Thanks! • Helen, for supporting me for 7.5 years! • Tommi Jaakkola, for answering all my questions and directing me to the “end”! • Mike Collins and Tommy Poggio for add’l guidance. • Nati Srebro & John Barnett for endless valuable discussions and ideas. • Amir Globerson, David Sontag, Luis Ortiz, Luis Perez-Breva, Alan Qi, & Patrycja Missiuro & all past members of Tommi’s reading group for paper discussions, conference trips and feedback on my talks. • Many, many others who have helped me along the way!

  45. Objective Minimum Low-Rank Minimum Low-Rank Local Minimum Low- Rank Low- Rank Low-Rank Optimization

More Related