1 / 29

Collaborative Filtering with Temporal Dynamics

Collaborative Filtering with Temporal Dynamics. Yehuda Koren. Recommender systems. We Know What You Ought To Be Watching This Summer. Collaborative filtering. Recommend items based on past transactions of users Specific data characteristics are irrelevant Domain-free

carmelb
Download Presentation

Collaborative Filtering with Temporal Dynamics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collaborative Filtering with Temporal Dynamics Yehuda Koren

  2. Recommender systems We Know What You OughtTo Be Watching This Summer

  3. Collaborative filtering • Recommend items based on past transactions of users • Specific data characteristics are irrelevant • Domain-free • Can identify elusive aspects • Two popular approaches: • Matrix factorization • Neighborhood

  4. Movie rating data Training data Test data

  5. erroneous accurate Achievable RMSEs on the Netflix data Global average: 1.1296 Find better items User average: 1.0651 Movie average: 1.0533 Personalization Cinematch: 0.9514; baseline “Algorithmics” Static neighborhood: 0.9002 Static factorization: 0.8911 Time effects Leader: 0.8558; 10.05% improvement Inherent noise: ????

  6. Something Happened in Early 2004… 2004

  7. Are movies getting better with time?

  8. Multiple sources of temporal dynamics • Item-side effects: • Product perception and popularity are constantly changing • Seasonal patterns influence items’ popularity • User-side effects: • Customers ever redefine their taste • Transient, short-term bias; anchoring • Drifting rating scale • Change of rater within household

  9. Temporal dynamics - challenges • Multiple sources: Both items and users are changing over time • Multiple targets: Each user/item forms a unique time series  Scarce data per target • Inter-related targets: Signal needs to be shared among users – foundation of collaborative filtering  cannot isolate multiple problems  Common “concept drift” methodologies won’t hold.E.g., underweighting older instances is unappealing

  10. Basic matrix factorization model users ~ items users ~ items A rank-3 SVD approximation

  11. Estimate unknown ratings as inner-products of factors: users ? ~ items users ~ items A rank-3 SVD approximation

  12. Estimate unknown ratings as inner-products of factors: users ? ~ items users ~ items A rank-3 SVD approximation

  13. Estimate unknown ratings as inner-products of factors: users 2.4 ~ items users ~ items A rank-3 SVD approximation

  14. Matrix factorization model Properties: • SVD isn’t defined when entries are unknown  use specialized methods • Can easily overfit, sensitive to regularization • Need to separate main effects… ~

  15. Baseline predictors • Mean rating: 3.7 stars • The Sixth Sense is 0.5 stars above avg • Joe rates 0.2 stars below avg Baseline prediction:Joe will rate The Sixth Sense4 stars No user-item interaction

  16. Factor model correction • Both The Sixth Sense and Joe are placed high on the “Supernatural Thrillers” scale Adjusted estimate:Joe will rate The Sixth Sense4.5 stars

  17. Matrix factorization with biases Baseline predictors: μ– global average bu – bias of u bi – bias of i User-item interaction: pu – user u‘sfactors qi – item i‘sfactors Minimization problem: regularization

  18. Addressing temporal dynamics • Factor model conveniently allows separately treating different aspects • We observe changes in: • Rating scale of individual users • Popularity of individual items • User preferences Baseline predictors User factors

  19. Parameterizing the model • Use functional forms: bu(t)=f(u,t), bi(t)=g(i,t), pu(t)=h(u,t) • Need to find adequate f(), g(), h() • General guidelines: • Items show slower temporal changes • Users exhibit frequent and sudden changes • Factors –pu(t)– are expensive to model • Gain flexibility by heavily parameterizing the functions

  20. erroneous accurate Achievable RMSEs on the Netflix data Global average: 1.1296 Find better items User average: 1.0651 Movie average: 1.0533 Personalization Cinematch: 0.9514; baseline “Algorithmics” Static neighborhood: 0.9002 Static factorization: 0.8911 Time effects Dynamic factorization: 0.8794 Grand Prize: 0.8563; 10% improvement Inherent noise: ????

  21. Neighborhood-based CF • Earliest and most common collaborative filtering method • Derive unknown ratings from those of “similar” items (item-item variant)

  22. Neighborhood modeling Use item-item weights - wij- to relate items: Need to estimate rating of user u for item i Deviation from baseline estimate for item j Baseline predictor Weight from j to i Set of items rated by u constants learned from the data through optimization

  23. Optimizing the model Minimize the squared error function:

  24. Making the model time-aware • A popular scheme – instance weighting:decay the significance of outdated events within cost function: time decay Don’t do this!

  25. Why instance weighting isn’t suitable? • Not enough data per user – need to exploit all signal, including old one • The learnt parameters – wij– represent time invariant item-item relations. Can be also deduced from older actions. • Two items are related when users rated them similarly within a short timeframe, even if this happened long ago • How to do it right?

  26. Time-aware neighborhood model • Decay item-item relations based on time distance • User-specific decay rate; controlled by βu • All past user behavior is equally considered, through cost function:

  27. erroneous accurate Temporal neighborhood model delivers same relative RMSE improvement (0.0117) as temporal factor model (!) Global average: 1.1296 Find better items User average: 1.0651 Movie average: 1.0533 Personalization Cinematch: 0.9514; baseline “Algorithmics” Static neighborhood:0.9002 Static factorization: 0.8911 Dynamic neighborhood: 0.8885 Time effects Dynamic factorization: 0.8794 Grand Prize: 0.8563; 10% improvement Inherent noise: ????

  28. Lessons • Modeling temporal effects is significant in improving recommenders accuracy • Allow multiple time drifting patterns across users and items • Integrate allusers within a single model to allow crucial cross-user collaboration • Model user behavior along full history, do not over-emphasize recent actions • Separate long term values, while excluding transient fluctuations from the model • Sudden, single-day effects are significant • Modeling past temporal fluctuations helps in predicting future behavior, even though we do not extrapolate future temporal dynamics

  29. Yehuda Koren Yahoo! Research yehuda@yahoo-inc.com

More Related