580 likes | 790 Views
Cross-Selling with Collaborative Filtering. Qiang Yang HKUST Thanks: Sonny Chee. Motivation. Question: A user bought some products already what other products to recommend to a user? Collaborative Filtering (CF) Automates “circle of advisors”. +. Collaborative Filtering.
E N D
Cross-Selling with Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee
Motivation • Question: • A user bought some products already • what other products to recommend to a user? • Collaborative Filtering (CF) • Automates “circle of advisors”. +
Collaborative Filtering “..people collaborate to help one another perform filtering by recording their reactions...” (Tapestry) • Finds users whose taste is similar to you and uses them to make recommendations. • Complimentary to IR/IF. • IR/IF finds similar documents – CF finds similar users.
Example • Which movie would Sammy watch next? • Ratings 1--5 • If we just use the average of other users who voted on these movies, then we get • Matrix= 3; Titanic= 14/4=3.5 • Recommend Titanic! • But, is this reasonable?
Types of Collaborative Filtering Algorithms • Collaborative Filters • Statistical Collaborative Filters • Probabilistic Collaborative Filters [PHL00] • Bayesian Filters [BP99][BHK98] • Association Rules [Agrawal, Han] • Open Problems • Sparsity, First Rater, Scalability
Statistical Collaborative Filters • Users annotate items with numeric ratings. • Users who rate items “similarly” become mutual advisors. • Recommendation computed by taking a weighted aggregate of advisor ratings.
Basic Idea • Nearest Neighbor Algorithm • Given a user a and item i • First, find the the most similar users to a, • Let these be Y • Second, find how these users (Y) ranked i, • Then, calculate a predicted rating of a on i based on some average of all these users Y • How to calculate the similarity and average?
Statistical Filters • GroupLens [Resnick et al 94, MIT] • Filters UseNet News postings • Similarity: Pearson correlation • Prediction: Weighted deviation from mean
Pearson Correlation • Weight between users a and u • Compute similarity matrix between users • Use Pearson Correlation (-1, 0, 1) • Let items be all items that users rated
Prediction Generation • Predicts how much a user a likes an item i • Generate predictions using weighted deviation from the mean • : sum of all weights (1)
Error Estimation • Mean Absolute Error (MAE) for user a • Standard Deviation of the errors
Example Correlation Sammy Dylan Mathew Sammy 1 1 -0.87 Dylan 1 1 0.21 Users Mathew -0.87 0.21 1 =0.83
Statistical Collaborative Filters • Ringo [Shardanand and Maes 95 (MIT)] • Recommends music albums • Each user buys certain music artists’ CDs • Base case: weighted average • Predictions • Mean square difference • First compute dissimilarity between pairs of users • Then find all users Y with dissimilarity less than L • Compute the weighted average of ratings of these users • Pearson correlation (Equation 1) • Constrained Pearson correlation (Equation 1 with weighted average of similar users (corr > L))
Open Problems in CF • “Sparsity Problem” • CFs have poor accuracy and coverage in comparison to population averages at low rating density [GSK+99]. • “First Rater Problem” • The first person to rate an item receives no benefit. CF depends upon altruism. [AZ97]
Open Problems in CF • “Scalability Problem” • CF is computationally expensive. Fastest published algorithms (nearest-neighbor) are n2. • Any indexing method for speeding up? • Has received relatively little attention. • References in CF: • http://www.cs.sfu.ca/CC/470/qyang/lectures/cfref.htm
References • P. Domingos and M. Richardson, Mining the Network Value of Customers, Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining (pp. 57-66), 2001. San Francisco, CA: ACM Press.
Market to Affected (under the network effect) Lowexpectedprofit Highexpectedprofit Marketed Highexpectedprofit Motivation • Network value is ignored (Direct marketing). • Examples:
Some Successful Case • Hotmail • Grew from 0 to 12 million users in 18 months • Each email include a promotional URL of it. • ICQ • Expand quickly • First appear, user addicted to it • Depend it to contact with friend
Introduction • Incorporate the network value in maximizing the expected profit. • Social networks: modeled by the Markov random field • Probability to buy = Desirability of the item + Influence from others • Goal = maximize the expected profit
Focus • Making use of network value practically in recommendation • Although the algorithm may be used in other applications, the focus is NOT a generic algorithm
Assumption • Customer (buying) decision can be affected by other customer’s rating • Market to people who is inclined to see the film • One will not continue to use the system if he did not find its recommendations useful (natural elimination assumption)
Modeling • View the markets as Social Networks • Model the Social Network as Markov random field • What is Markov random field ? • An experiment with outcomes being functions of more than one continuous variable. [e.g. P(x,y,z)] • The outcome depends on the neighbors’.
Variable definition • X={X1, …,Xn} : a set of n potential customer, Xi=1 (buy), Xi=0 (not buy) • Xk (known value), Xu (unknown value) • Ni ={Xi,1,…, Xi,n} : neighbor of Xi • Y={Y1,…, Ym} : a set of attribute to describe the product • M={M1,…, Mn} : a set of market action to each customer
Example (set of Y) • Using EachMovie as example. • Xi : Whether the person i saw the movie ? • Y : The movie genre • Ri : Rating to the movie by person i • It sets Y as the movie genre, • different problems can set different Y.
Goal of modeling • To find the market action (M) to different customer, to achieve best profit. • Profit is called ELP (expected lift in profit) • ELPi(Xk,Y,M) = r1P(Xi=1|Xk,Y,fi1(M))-r0P(Xi=1|Xk,Y,fi0(M)) –c • r1: revenue with market action • r0: revenue without market action
Three different modeling algorithm • Single pass • Greedy search • Hill-climbing search
C A D B Discount + suggest Discount / suggest Discount / suggest Scenarios • Customer {A,B,C,D} • A: He/She will buy the product if someone suggest and discount ( and M=1) • C,D: He/She will buy the product if someone suggest or discount (M=1) • B: He/She will never buy the product The best: M=1 M=1
Single pass • For each i, set Mi=1 if ELP(Xk,Y,fi1(M0)) > 0, and set Mi=0 otherwise. • Adv: Fast algorithm, one pass only • Disadv: • Some market action to the later customer may affect the previous customer • And they are ignored
C A D B Discount + suggest Discount / suggest Discount / suggest Single Pass Example A, B, C, D • M = {0,0,0,0} ELP(Xk,Y,f01(M0)) <= 0 • M = {0,0,0,0} ELP(Xk,Y,f11(M0)) <= 0 • M = {0,0,0,0} ELP(Xk,Y,f21(M0)) > 0 • M = {0,0,1,0} ELP(Xk,Y,f31(M0)) > 0 • M = {0,0,1,1} Done Single pass M=1 M=1
Greedy Algorithm • Set M= M0. • Loop through the Mi’s, • setting each Mi to 1 if ELP(Xk,Y,fi1(M)) > ELP(Xk,Y,M). • Continue until no changes in M. • Adv: Later changes to the Mi’s will affect the previous Mi. • Disadv: It takes much more computation time, several scans needed.
C A D B Discount + suggest Discount / suggest Discount / suggest Greedy Example A, B, C, D • M0 = {0,0,0,0} no pass • M = {0,0,1,1} first pass • M = {1,0,1,1} second pass • M = {1,0,1,1} Done M=1 M=1 M=1
Hill-climbing search • Set M= M0. Set Mi1=1, where i1=argmaxi{ELP(Xk,Y,fi1(M))}. • Repeat • Let i=argmaxi{ELP(Xk,Y, fi1( fi11(M)))} • set Mi=1, • Until there is no i for setting Mi=1 with a larger ELP. • Adv: • The bestM will be calculated, as each time the best Mi will be selected. • Disadv: The most expensive algorithm.
C A D B Discount + suggest Discount / suggest Discount / suggest Hill Climbing Example A, B, C, D • M = {0,0,0,0} no pass • M = {0,0,1,0} first pass • M = {1,0,1,0} Second pass • M = {1,0,1,0} Done The best: M=1 M=1
Who Are the Neighbors? • Mining Social Networks by Using Collaborative Filtering (CFinSC). • Using Pearson correlation coefficient to calculate the similarity. • The result in CFinSC can be used to calculate the Social networks. • ELP and M can be found by Social networks.
Who are the neighbors? • Calculate the weight of every customer by the following equation:
Neighbors’ Ratings for Product • Calculate the Rating of the neighbor by the following equation. • If the neighbor did not rate the item, Rjk is set to mean of Rj
Estimating the Probabilities • P(Xi) :Items rated by user i • P(Yk|Xi) :Obtained by counting the number of occurrences of each value of Yk with each value of Xi. • P(Mi|Xi) :Select user in random, do market action to them, record their effect. (If data not available, using prior knowledge to judge)
Preprocessing • Zero mean • Prune people ratings cover too few movies (10) • Non-zero standard deviation in ratings • PenalizethePearson correlation coefficient if both users rate very few movies in common • Remove all movies which were viewed by < 1% of the people
Experiment Setup • Data: Each movie • Trainset & Testset (temporal effect) rating 1/96 9/96 9/97 Trainset Testset (old) (new) 1/96 9/96 12/96 released
Experiment Setup – cont. • Target: 3 methods of searching an optimized marketing action VS baseline (direct marketing)
Experiment Results [Quote from the paper directly]
Experiment Results – cont. • Proposed algorithms are much better than direct marketing • Hill >(slight) greedy >> single-pass >> direct • Higher α, better results!
Item Selection By “Hub-Authority” Profit RankingACM KDD2002 Ke Wang Ming-Yen Thomas Su Simon Fraser University
Ranking in Inter-related World • Web pages • Social networks • Cross-selling
Item Ranking with Cross-selling Effect • What are the most profitable items? 100% $10 $8 $5 60% 50% 35% $3 $1.5 100% $3 $0.5 30% $2 $15
The Hub/Authority Modeling • Hubs i: “introductory” for sales of other items j (i->j). • Authorities j: “necessary” for sales of other items i (i->j). • Solution: model the mutual enforcement of hub and authority weights through links. • Challenges: Incorporate individual profits of items and strength of links, and ensure hub/authority weights converges
Selecting Most Profitable Items • Size-constrained selection • given a size s, find s items that produce the most profit as a whole • solution: select the s items at the top of ranking • Cost-constrained selection • given the cost for selecting each item, find a collection of items that produce the most profit as a whole • solution: the same as above for uniform cost
Estimated profit Selection cost Optimal cutoff # of items selected Solution to const-constrained selection
Web Page Ranking Algorithm – HITS (Hyperlink-Induced Topic Search) • Mutually reinforcing relationship • Hub weight: h(i) = a(j), for all page j such that i have a link to j • Authority weight: a(i) = h(j), for all page j that have a link to i h(j) • a and h converge if normalized before each iteration