330 likes | 498 Views
E-Commerce. Outline. Introduction Customer Data on the Web Automated Recommender Systems Networks and Recommendations Web Path Analysis for Purchase Prediction. Introduction. Some Motivating Questions
E N D
Outline • Introduction • Customer Data on the Web • Automated Recommender Systems • Networks and Recommendations • Web Path Analysis for Purchase Prediction
Introduction • Some Motivating Questions • Can we design algorithms to help recommend new products to visitors based on their browsing behavior? • Can we better understand factors influencing how customers make purchases on a website? • Can we predict in real time who will make purchases based on their observed navigation patterns?
Customer Data on the Web • Data collection on client, server sides and anywhere in between • Goal determine who is purchasing what products • Tracking customer data • Web logs, E-Commerce logs, cookies, explicit login • Data then used to provide personalized content to site users to: • Assist customers in locating their target selections • “Encourage” customers to make certain selections
Automated Recommender Systems • Problem framed in two ways • Users ‘vote’ for pages/items (binary) • Users rank pages/items (multivalued) • Results are captured in a generally sparse matrix (users x items) • Complication: no votes can occur because users do not vote on items they do not like (Breeze, et al 1998) • Ignored by most recommender systems
Evaluating Recommender Systems • Cautions in data interpretation • Users may purchase items regardless of recommendations • Users may also avoid purchases they might have made based on recommendations • Approaches to recommender algorithms • Nearest-neighbor • Model-based collaborative filtering • Others?
Nearest-Neighbor Collaborative Filtering • Basic principle: utilize user’s vote history to predict future votes/recommendations • Find most similar users to the target user in the training matrix and fill in the target user’s missing vote values based on these “nearest-neighbors” • A typical normalized prediction scheme: goal: predict vote for item ‘j’ based on other users, weighted towards those with similar past votes as target user ‘a’
Nearest-Neighbor Collaborative Filtering • Another challenge: defining weights • What is “the most optimal weight calculation” to use? • Requires fine tuning of weighting algorithm for the particular data set • What do we do when the target user has not voted enough to provide a reliable set of nearest-neighbors? • One approach: use default votes (popular items) to populate matrix on items neither the target user nor the nearest-neighbor have voted on • A different approach: model-based prediction using Dirichlet priors to smooth the votes (see chapter 7) • Other factors include relative vote counts for all items between users, thresholding, clustering (see Sarwar, 2000)
Nearest-Neighbor Collaborative Filtering • Structure based recommendations • Recommendations based on similarities between items with positive votes (as opposed to votes of other users) • Structure of item dependencies modeled through dimensionality reduction via singular value decomposition (SVD) aka latent semantic indexing (see chapter 4) • Approximate the set of row-vector votes as a linear combination of basis column-vectors • i.e. find the set of columns to least-squares minimize the difference between the row estimations and their true values • Perform nearest-neighbor calculations to project predictions for all items
Model Based Collaborative Filtering • Recommendations based on a model of relationships between items based on historical voting patterns in the training set • Better performance than nearest-neighbor analysis • Joint distribution modeling • Uses one model as basis for predictions • Conditional distribution modeling • A model for each item predicting future vote based on votes for each of the other items
Model Based Collaborative Filtering • Joint distribution modeling: A practical approach • Model joint distribution as a finite mixture of simpler distributions • Additional simplification is achieved by assuming that votes are independent of others within a component • Limitation: assumes that users can be described with one model of the ‘K’ mixture components • Hoffman and Puzicha (1999) propose a workaround asserting that each row of votes represents up to ‘K’ mixture components, rather than a single component
Model Based Collaborative Filtering • Another limitation: all predictions are based on the (static) training set • Conditional distribution modeling • Better results by creating a model for each item conditioned on the others rather than using a single joint density model • Decision trees Heckerman et al. (2000) • Greedy approach to approximate tree structure • Predictions are made for each item not purchased or visited • Performance • Accuracy nearly equal to Bayesian networks • Offline memory usage significantly less than Bayesian networks • Offline computation time complexity better than Bayesian networks
Model-Based Combining of Votes and Content • Combine content-specific information with other information (e.g. structure, vote) • Useful for determining item similarity (Mooney and Roy 2000) and creating user models • Useful when there is no vote history • Implementation (Popescul et al 2000) • Extension of (Hoffman and Puzicha 1999) • Joint density is determined assuming a hidden latent variable making users, documents, and words conditionally independent i.e.
Model-Based Combining of Votes and Content • The hidden variable represents multiple (hidden) topics of a document • Conditional probabilities of the hidden parameter are calculated using EM • Sparsity still remains a problem for content-based modeling
Challenges • Noisy Data • The same user may use multiple IP addresses/logins • Different users may use the same IP address/login • Privacy • No cookies! • Changing user habits • Previous history may not accurately predict present purchase selection • Continuous updating of user activities
Networks & Recommendation • Word-of-Mouth • Needs little explicit advertising • Products are recommended to friends, family, co-workers, etc. • This is the primary form of advertising behind the growth of Google
Email Product Recommendation • Hotmail • Very little direct advertising in the beginning • Launched in July 1996 • 20,000 subscribers after a month • 100,000 subscribers after 3 months • 1,000,000 subscribers after 6 months • 12,000,000 subscribers after 18 months • By April 2002 Hotmail had 110 million subscribers
Email Product Recommendation • What was Hotmail’s primary form of advertising? • Small link to the sign up page at the bottom of every email sent by a subscriber • ‘Spreading Activation’ • Implicit recommendation
Spreading Activation • Network effects • Even if a small number of people who receive the message subscribe (~0.1%), the service will spread rapidly • This can be contrasted with the current practice of SPAM • SPAM is not sent by friends, family, co-workers • No implicit recommendation • SPAM is often viewed as not providing a good service
Modeling Spreading Activation • Diffusion Model • Montgomery (2002) • Applied models used in marketing literature, Bass (1969) to the hotmail phenomena • Similar word-of-mouth networks used in selling consumer electronics such as refrigerators and televisions • We want to predict at time t how many individuals k(t) will adopt the product out of a population of N possible adopters
Modeling Spreading Activation • Diffusion Model • Two ways individuals will subscribe • Direct Advertising • At time t, N – k(t) individuals have not subscribed • α ≥ 0 percent of these individuals will subscribe due to direct advertising • Word-of-Mouth • At time t, there are k(t)(N – k(t)) possible connections between subscribers and non-subscribers • β ≥ 0 percent of these connections will cause a non-subscriber to subscribe
Modeling Spreading Activation • Combine these and we get the following expression: • Solve this and we get:
Modeling Spreading Activation • Diffusion Model • This does not completely model the what actually occurred • However, it is simple and provides a lot of interesting (useful) information • Other work • Domingos & Richardson (2001) Markov Random Field Model • Daley & Gani (1999) various deterministic and stochastic models
Purchase Prediction • We want to predict whether or not a shopper will make a purchase • We know demographics • We know page view patterns • Can we accurately predict if the user will make a purchase or not?
Purchase Prediction • Li et al. (2002) • Study 1160 shoppers at www.barnesandnoble.com between April 1 and April 30, 2002 • The data was collected client side so they knew exactly what pages were displayed to the user • They also knew the demographics (predominantly well-educated and affluent)
Purchase Prediction • Li et al. (2002) • There were 14,512 page views which they divided into 1659 sessions • Mean: 8.75 • Median: 5 • Standard deviation: 16.4 • Min: 1 • Max: 570 • 7% of sessions contained a purchase
Purchase Prediction • Li et al. (2002) • Divided the pages into 8 classes • Home (H), main page • Account (A), account information pages • List (L), pages with lists of items • Product (P), page with a single item • Information (I), informational pages (shipping etc.) • Shopping cart (S) • Order (O), indicates a completed order • Entry or Exit (E), entering or leaving the site
Purchase Prediction • Li et al. (2002) • Each session was represented by a string of the form: I H H I I L I I E • A session containing an O is considered having made a purchase • The average length of a session with a purchase was 34.5 and without was only 6.8
Purchase Prediction • Markov transition matrix • For sessions with no purchase
Purchase Prediction • Li et al. (2002) • They did several models based on this data • Tested on predicting next page and predicting a purchase • Best models 64% accurate at predicting next page • After 2 page views the best models predicted 12% true positives and 5.3% false positives • After 6 page views 13.1% true positives and 2.9% false positives