290 likes | 488 Views
Contextual Recommendation in Multi-User Devices Raz Nissim, Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee. Recommendation in Personal Devices and Accounts. Challenge: Recommendations in Shared Accounts and Devices.
E N D
Contextual Recommendationin Multi-User DevicesRaz Nissim, Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee
Challenge: Recommendations in Shared Accounts and Devices • “I am a 34 yo man who enjoys action and sci-fi movies. This is what my children have done to my netflix account”
Our Focus: Recommendations for Smart TVs Smart TVs can track what is being watched on them • Main problems: • Inferring who has consumed each item in the past • Who is currently requesting the recommendations • “Who” can be a subset of users
Solution: Using Context Previous work: time of day
This Work: Contextual Personalized Recommendations WatchItNext problem: it is 8:30pm and “House of Cards” is on What should we recommend to be watched next on this device? Implicit assumption: there’s a good chance whoever is in front of the set now, will remain there Technically, think of HMM where the hidden state corresponds to who is watching the set, and states don’t change too often
WatchItNext Inputs and Output Available programs, a.k.a. “line-up” Ranked recommendations
Recommendation Settings: Exploratory and Habitual • One typically doesn’t buy the same book twice, nor do people typically read the same news story twice • But people listen to the songs they like over and over again, and watch movies they like multiple times as well In the TV setting, people regularly watch series and sports events Habitual setting: all line-up items are eligible for recommendation to a device Exploratory setting: only items that were not previously watched on the device are eligible for recommendation
Contextual Recommendations in a Different Context Personalized How can contextualized and personalized recommendations be served together? Contextual Popular
Collaborative Filtering • A fundamental principle in recommender systems • Taps similarities in patterns of consumption/enjoyment of items by users • Recommends to a user what users with detected similar tastes have consumed/enjoyed
Collaborative Filtering – Mathematical Abstraction • Consider a consumption matrix R of users and items • ru,i=1 whenever person u consumed item i • In other cases, ru,i might be person u’s rating on item i • The matrix R is typically very sparse • …and often very large Items • Real-life task: top-k recommendation • predict which yet-to-be-consumed items the user would most enjoy • Related task on ratings data: matrix completion • Predict users’ ratings for items they have yet to rate, i.e. “complete” missing values R = users |U| x |I|
Collaborative Filtering – Matrix Factorization • Latent factor models (LFM): • Map both users and items to some f-dimensional space Rf, i.e. produce f-dimensional vectors vu and wi for each user and item • Define rating estimates as inner products: qui = <vu,wi> • Main problem: finding a mapping of users and items to the latent factor space that produces “good” estimates Items V W Closely related to dimensionality reduction techniques of the ratings matrix R (e.g. Singular Value Decomposition) R = ≈ users |U| x f f x |I| |U| x |I|
LFMs Rise to Fame: Netflix Prize Used extensively by Challenge winners “Bellkor’s Pragmatic Chaos” (2006-2009)
Latent Dirichlet Allocation (LDA)[Blei, Ng, Jordan 2003] • Originally devised as a generative model of documents in a corpus, where documents are represented as bags-of-words • L • k is a parameter representing the number of “topics” in the corpus • V is a stochastic matrix: V[d,t] = P(topict|documentd), t=1,…,k • U is a stochastic matrix: U[t,w] = P(wordw|topict), t=1,…,k • L is a vector holding the documents’ lengths (#words per document) V U L ≈ |D| x k k x |W|
Latent Dirichlet Allocation (cont.) • In our case: given a parameter k and the collection of devices (=documents) and their viewing history (=bags of shows), output: • k “profiles”, where each profile is a distribution over items • Associate each device to a distribution over the profiles • Profiles, hopefully, will represent viewing preferences such as: • “Kids shows” • “Cooking reality and home improvement” • “News and Late Night” • “History and Science” • “Redneck reality: fishing & hunting shows, MMA” • A-priori probability of an item being watched on a device: Score(item|device) = profile=1,…,k P(item|profile) x P(profile|device)
Contextualizing Recommendations: Three Main Approaches • Contextual pre-filtering: use context to restrict the data to be modeled • Contextual post-filtering: use context to filter or weight the recommendations produced by conventional models • Contextual modeling: context information is incorporated in the model itself • Typically requires denser data due to many more parameters • Computationally intensive • E.g. Tensor Factorization, Karatzoglou et al., 2010
Main Contribution:“3-Way” Technique • Learn a standard matrix factorization model (LFM/LDA) • When recommending to a device d currently watching context item c, score each target item t as follows: S(t follows c|d) = j=1..kvd(j)*wc(j)*wt(j) • With LFM, requires an additive shift to all vectors to get rid of negative values • Results in “Sequential LFM/LDA” – a personalized contextual recommender • Score is high for targets that agree with both context and device • Again – no need to model context or change learning algorithm; learn as usual, just apply change when scoring
Data: Historical Viewing Logs • Triplets of the form (devide ID, program ID, timestamp) • Don’t know who watched the device at that time • Actually, don’t know whether anyone watched Is anyone watching? Time
Data by the Numbers • Training data: three months’ worth of viewership data • Test Data: derived from one month of viewership data * Items are {movie, sports event, series} – not at the individual episode level
Metric: Avg. Rank Percentile (ARP) Note: with large line-ups, ARP is practically equivalent to average AUC Rank Percentile properties: Ranges in (0,1] Higher is better Random scores ~0.5 in large lineups (RP = 1.0) RP = 0.75 (RP = 0.50) ? next (RP = 0.25)
Baselines * Only applicable to habitual recommendations
Contextual Personalized Recommenders • SequentialLDA [LFM]: 3-way element-wise multiplication of device vector, context item and target item • TemporalLDA[LFM]: regular LDA/LFM score, multiplied by Temporal Popularity • TempSeqLDA[LFM]: 3-way score multiplied by Temporal Popularity • All LDA/LFM models are 80-dimensional
Results (1)Sequential Context Matters • Degradation when using a random item as context indicates that the correct context item reflects the current viewing session, and implicitly the current watchers of the device
Results (2)Sequential Context Matters Device Entropy: the entropy of p(topic | device) as computed by LDA on the training data; high values correspond to diverse distributions
Conclusions • Multi-user or shared devices pose challenging recommendation problems • TV recommendations characterized by two use cases – habitual and exploratory • Sequential context helps – it “narrows" the topical variety of the program to be watched next on the device • Intuitively, context serves to implicitly disambiguate the current user or users of the device • 3-Way technique is an effective way of incorporating sequential context that has no impact on learning • Future: explore applications of Hidden Topic Markov Models [Gruber, Rosen-Zvi, Weiss 2007]
Thank You – Questions? rlempel [at] yahoo-inc [dot] com