Breaking out of the Box of Recommendations: From Items to Packages

Breaking out of the Box of Recommendations: From Items to Packages M.Xie1, L. Lakshmanan1, P. Wood2 1Univ. of British Columbia 2Univ. Of London RecSys ‘10 2011. 01. 14. Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University

Introduction • Classical Recommendation System provide recommendations consisting of single item • Several Applications can benefit from a system capable of recommending package of items • Trip planning • Recommendation for tweeters to follow • There may be a notion of compatibility among items in a set, modeled in the form of constraints • No more than 3 museums in a package • The total distance covered in visiting all POIs in package should be ≤ 10km

YourTour

Contents • Composite Recommendation • Related Works • System Architectures • Problem Statements • 0/1 knapsack problem • Algorithms • InsOpt-CR • Greedy-CR • Experiments • Conclusions / Discussion

Composite Recommendation • Each item has • A Value (rating or score) • A Cost • A maximum total cost (budget) • Find a set of items with the highest total value • Assumptions • Items can be accessed by non-increasing order • There are Information sources providing the cost associated with each item • The number of items is very large, and access to these ratings can be relatively expensive

Related Works • CARD (RecSys ’08), FlexRecs (SIGMOD ‘09) • Comprehensive frameworks • User can specify their recommendation preferences using relational query languages extended with additional features or operators • A. Angel et al(EDBT ‘09) • Finding packages of entities • CourseRank (RecSys ’09) • Provides a course recommendation to students • Based on the ratings given to course by past students and subject to the constraints of degree requirements

System Architecture Figure 1. System Architecture • External Cost Source • Provides the cost of a given item • Can be a local database or a web service • Compatibility Checker • Checks whether a package satisfies compatibility constraints

Top-k Composite Recommendation • Find the top-k packages P1,…,Pk such that • Each Pi is feasible (the total cost of Pi <= Budget B) • All packages P1,…,Pk have the k-highest value. • v(P)<v(Pi) for all feasible packages P ∉ {P1,..,Pk} • Top-1 composite recommendation problem (CompRec) • A variation of 0/1 knapsack problem • Items can be accessed by non-increasing order • Background information can be a histogram collected from the external cost source or something as simple as a minimum item cost cmin

0/1 Knapsack Problem Maximize Subject to NP-Complete

Notations if o.w. • S = {t1,…,tn} : set of items • SSi,v : subset of {t1,…,tn} whose total value is exactly v and whose total cost is minimized • i ∈ {1, …, n} • v ∈ {1, …, nv(t1)} • C(i,v) : the cost of SSi,v • It can be calculated by the following recursive function

Algorithm 1: MaxValBound The value V* returned by MaxValBound is an upperbound on value of optimization solution (Lemma 1)

Algorithm 2: InsOpt-CR One item is retrieved from the source at each iteration of the algorithm (line 3) The pseudo-polynomial algorithm to find an optimal solution (line 5) If v(R0) ≥ ½ V*, the algorithm terminates (line 7-8) V* ≥ v(OPT), v(R0) ≥ ½V* ⇒ v(R0) ≥ ½ v(OPT)

Example • Cmin = 0.5, vmin = 1 • After accessing the first 101 items, S = {t1,…,t101} • R0 = {t1} ⋃ {t3,…,t101} • V(R0) = 200 ≥ ½ *398 = ½ * V* • Example • Budget B = 199 • A value and cost of item is as follows:

Instance Optimality • This means that any other 2-approximation algorithm, that can only access items in non-increasing order of their value, must access at least as many items as our algorithm • Definition • 𝓐 be a class of algorithms, and let 𝓘 be a class of problem instance • Given a non-negative cost measure cost (𝓐, 𝓘) of runningalgorithm 𝓐 over 𝓘 • An Algorithm Ais instance optimal over 𝓐 and 𝓘 • If every A’∈𝓐 and every I∈𝓘, • Cost(A,I) ≤ c * Cost(A’,I) + c’ for constants c and c’ • InsOpt-CR is an instance optimal over A an I with an optimality ratio of one

Greedy Algorithms • Greedy-CR is not instance optimal Instance Optimal Algorithms rely on an exact algorithm for the knapsack problem which may lead to high computational cost

Example • Cmin = 0.5, vmin = 1 • After accessing the first 101 items, S = {t1,…,t101} • RG = {t1} • V(RG) = 101 < ½ *398 = ½ * V* • Greedy-CR will continue accessing new items and it accesses another 98 items before it stops • Example • Budget B = 199 • A value and cost of item is as follows:

Top-k Composite Recommendation • Extends the top-1 Composite Recommendation • Apply Lawler’s procedure to InsOpt-CR • Lawler’s procedure • General technique for top-k answers to an optimization problem • Step 1. compute optimal solution x = <x1, …., xn> • Step 2. fix the values of x1, …, xs Then create (n-s) problems by fixing the remaining variable as follows: (1) xs+1 = 1 – xs+1(k) (2) xs+1 = xs+1(k), xs+2 = 1 – xs+2(k) … (n-s) xs+1 = xs+1(k), xs+2 = xs+2(k), …. , xn = 1 – xn(k)

Lawler’s procedure Optimal Solution fix Problem1 Problem2 Problem(n-s) • Computational Complexity: O(knc(n)) • c(n): the cost of computing single optimization problem

Boolean Compatibility Constraints • If the package fails the compatibility check, discard it and search for the next candidate package • Modified InsOpt-CR-Topk algorithm is still instance optimal

Experiments • The goal of experiment • Evaluating the relative quality of Inst-Opt-CR and Greedy-CR compared to the optimal algorithm • Evaluating the relative efficiency of the algorithms with respect to the number of items accessed and the actual run time • Datasets • MovieLens (ratings for movies) • Cost: running time of a movies • TripAdvisor (ratings for POIs) • Cost: number of reviews • The more popular a POI is, the more likely it is to be crowded or the more likely it is for the tickets to be expensive • Synthetic Dataset (correlated and uncorrelated)

Quality of Recommendation Packages Table 1. Quality Comparison for Different Composite Recommendation Algorithms Approximation algorithms do indeed return top-k composite packages whose value is guaranteed to be a 2-approximation of the optimal Approximation algorithms often recommend packages with high average value.

Normalized Discounted Cumulative Gain (NDCG) • Measure of effectiveness of a web search engine algorithms • Using graded relevance scale of documents • Assumption • Highly relevant documents are more useful when appearing earlier in a search engine result list (have higher ranks) • Highly relevant documents are more useful than marginally relevant documents, which are in turn more useful than irrelevant documents.

NDCG Example Results 1 Results 2 optimal D3,D5 are missing! D1 is missing! NDCG = 1.11 NDCG = 1.13 Result1 is closer to optimal

Quality of Recommended Packages Figure 2. NDCG Score for Top-k Packages The greedy algorithm can achieve a very similar overall top-k package quality compared to the instance optimal algorithm Both approximation algorithms have a very small NDCG score

Efficiency Study Figure 3. (a)-(d) Running Time for Different Datasets; (e)-(h) Access Cost for Different Datasets Greedy-CR-Topk has excellent performance in terms of both running time access cost except correlated synthetic dataset

Conclusions • Recommending packages consisting of sets of items • Generating top-k packages • Compatible • Under a cost budget • Two 2-approximation algorithms • InsOpt-CR-Topk (instatnce optimal) • Greedy-CR-Topk (faster) • Experimental shows that two proposed algorithms are • High quality packages • Fast and Practical

Discussion • Contribution • Composite Recommendation Modeling: budget • Proposing Approximation Algorithms with Proves • Good Quality and Fast • Issues • Is their cost model useful in practical? • The cost model is too simple and ideal • Proposed Algorithms seem to be a variation of knapsack problem solution • Choosing the cost in experiment is something weird • No comparison with other algorithms • Baseline: worst case

Thank you

Breaking out of the Box of Recommendations: From Items to Packages