1 / 22

Recommendation Engines: an Introduction

This article provides a brief history of recommendation engines and explains how they work. It covers collaborative filtering, cosine similarity, and gives examples of their applications. It also discusses challenges in dealing with cold start and data sparsity.

bernardino
Download Presentation

Recommendation Engines: an Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recommendation Engines: an Introduction Tom Rampley

  2. A Brief History of Recommendation Engines

  3. What Does a Recommender Do?

  4. Collaborative Filtering

  5. Cosine Similarity Example • Lets walk through an example of a simple collaborative filtering algorithm, namely cosine similarity • Cosine similarity can be used to find similar items, or similar individuals. In this case, we’ll be trying to identify individuals with similar taste • Imagine individual ratings on a set of items to be a [user,item] matrix. You can then treat the ratings of each individual as an N-dimensional vector of ratings on items: {r1, r2…rN} • The similarity of vectors (individuals’ ratings) can be computed by computing the cosine of the angle between them: • The closer the cosine is to 1, the more alike the two individuals’ ratings are

  6. Cosine Similarity Example Continued • Lets say we have the following matrix of users and ratings of TV shows: • And we encounter a new user, James, who has only seen and rated 5 of these 7 shows: • Of the two remaining shows, which one should we recommend to James?

  7. Cosine Similarity Example Continued • To find out, we’ll see who James is most similar to among the folks who have rated all the shows by calculating the cosine similarity between the vectors of the 5 shows that each individual have in common: • It seems that Mary is the closest to James in terms of show ratings among the group. Of the two remaining shows, The Wire and Twin Peaks, Mary slightly preferred Twin Peaks so that is what we recommend to James

  8. Collaborative Filtering Continued

  9. Adding ROI to the Equation: an Example with Naïve Bayes

  10. Naïve Bayes

  11. Naïve Bayes Continued • How does the NB algorithm generate class probabilities, and how can we use the algorithmic output to maximize expected payoff? • Let’s say we want to figure out which of two products to recommend to a customer • Each product generates a different amount of profit for our firm per unit sold • We know the target customer’s past purchasing behavior, and we know the past purchasing behavior of twelve other customers who have bought one of the two potential recommendation products • Let’s represent our knowledge as a series of matrices and vectors

  12. Naïve Bayes Continued

  13. Naïve Bayes Continued • NB uses (independent) probabilities of events to generate class probabilities • Using Bayes’ theorem (and ignoring the scaling constant) the probability of a customer with past purchase history α (a vector of past purchases) buying item θ is: P(α1,…,αi | θj ) P (θj ) • Where P (θj) is the frequency with which the item appears in the training data, and P(α1,…,αi | θj ) is Π P(αi | θj ) for all iitems in the training data • That P(α1,…,αi| θj ) P (θj ) = Π P(αi| θj ) P (θj) is dependent up on the assumption of conditional independence between past purchases

  14. Naïve Bayes Continued • In our example, we can calculate the following probabilities:

  15. Naïve Bayes Continued • Now that we can calculate P(α1,…,αi| θj ) P (θj ) for all instances, let’s figure out the most likely boat purchase for Eric: • These probabilities may seem very low, but recall that we left out the scaling constant in Bayes theorem since we’re only interested in the relative probabilities of the two outcomes

  16. Naïve Bayes Continued

  17. Challenges

  18. Dealing With Cold Start

  19. Dealing With Data Sparsity

  20. Dealing With Data Sparsity • Techniques like principal components analysis/singular value decomposition allow for the creation of low rank approximations to sparse matrices with relatively little loss of information

  21. Dealing With Sheep of Varying Darkness • To a large extent, these cases are unavoidable • Feedback on recommended items post purchase, as well as the purchase rate of recommended items, can be used to learn even very idiosyncratic preferences, but take longer than for a normal user • Grey and black sheep are doubly troublesome because their odd tendencies can also weaken the strength of your engine to make recommendations to the broad population of white sheep

  22. References A good survey of recommendation techniques Matrix factorization for use in recommenders Article on the BellKor solution to the Netflix challenge Article on Amazon's recommendation engine

More Related