290 likes | 410 Views
Recommender Systems. Customization. Customization is one of the more attractive features of electronic commerce. Creating a different product for every user, suited to his/her tastes. Once thought to be a novelty, now essential
E N D
Customization • Customization is one of the more attractive features of electronic commerce. • Creating a different product for every user, suited to his/her tastes. • Once thought to be a novelty, now essential • Provides a way for online providers to compete with brick-and-mortar competitors. • Possible to serve niche markets. • Bezos: “If I have two million customers on the Web, then I should have two million stores on the Web” • (how dated is that? )
How can personalization help? • Turn browsers into buyers • People may go to Amazon without a specific purchase in mind. • Showing them something they want can spur a purchase. • Cross-sales • Customers who have bought a product are suggested related products. • Encourages Loyalty • Amazon is interested in becoming an e-commerce portal. This means that they would like to respond to all your online purchasing needs.
Examples • Amazon • Featured Recommendations: tailored to past views/purchases. • People who bought this: compares customers • Alerts- sends you email when stuff you like is on sale. • Customer reviews • ListMania • Allows users to add their own reviews of products. • Customers can find other reviews by a given user.
Examples • Netflix • You rate movies and others are suggested based on these ratings. • You are compared to other users. • Reel.com • Movie Matches – you enter a movie, and it suggests similar movies. • Compares movies to movies.
Examples • Citeseer • Recommends papers based on citations, similar text, cited by. • Launch • Lets you customize your own “radio station”. • You get a customized mp3 stream
Types of recommendations • Population-based • For example, the most popular news articles, or searches, or downloads. • Useful for sites that frequently add content. • No user tracking needed. • Netflix: Movers on the top 100 • Reflects movies that have been popular overall.
Types of recommendations • Item-to-item • Content-based • One item is recommended based on the user’s indication that they like another item. • If you like Lord of the Rings, you’ll like Legend. • Netflix: 1-5 star rating. • Estimates how much you’ll like a movie based on your past ratings.
Types of Recommendations • Challenges with item-to-item: • Getting users to tell you what they like • Both financial and time reasons not to. • Getting enough data to make “novel” predictions. • What users really want are recommendations for things they’re not aware of.
Types of recommendations • Item-to-item • Most effective when you have metadata that lets you automatically relate items. • Genre, actors, director, etc. • Also best when decoupled from payment • Users should have an incentive to rate items truthfully.
Types of recommendations • User-based • “Users who bought X like Y.” • Each user is represented by a vector indicating his ratings for each product. • Users with a small distance between each other are similar. • Find a similar user and recommend things they like that you haven’t rated. • Netflix: “Users who liked …”
Types of recommendations • User-based • Advantages: • Users don’t need to rate much. • No info about products needed. • Easy to implement • Disadvantages • Pushes users “toward the middle” – products with more ratings carry more weight. • How to deal with new products? • Many products and few users -> lots of things don’t get recommended.
Types of Recommendations • Manual/free-form • Users write reviews for a product, which are attached to the product. • Advantages: • Natural language, explanations for pros/cons, users get to participate. • Disadvantages: • Few ‘neutral’ recommendations, difficult to automate. • Netflix: Member Reviews, Critic Reviews
Potential Applications • Placing a product in space • “The product you’re looking at is like …” • Configuring display • Choosing what to show or emphasize based on preferences. • Personalized discounts/coupons • Grocery stores do this. • Clustering users • Determining the tastes of your consumers.
Details: How RS work • Content-based (user-based) systems try to learn a model of a user’s preferences. • This is a function that, for each user, maps an item, to an indication of how much the user likes it. • Might be yes/no or probabilistic.
How RS work • A common model-learner is a naïve Bayes classifier. • An item is represented as a feature vector. • Web pages: list/bag of possible words • Movies: list of possible actors, directors, etc. • This vector is large, so common features are filtered out. (the, an, etc) • Useful for unstructured data such as text
Naïve Bayes Classifier • Maps from an input vector to a probability of liking. • Naïve: assumes inputs are independent of each other. • Probability that an item j belongs to class i, given a set of attribitutes: • P(Ci | A1=v1 & A2=v2 …An=vn) • If all A’s independent, we can use: • P(Ci) = P P(A = Vj | Ci) • (this is easy to compute) • Pick the C with the highest probability.
Training a Naïve Bayes Classifier • How do we know P(A = vj | Ci)? • User labels data for us (says what she likes). • For each class, we compute the fraction of times that A=vj
Example • Two classes (yes, no) • Three documents, each of which have four words. • D1: {cat, dog, fly, cow} -> yes • D2: {crow, straw, fly, zebra} -> no • D3: {cat, dog, zoom, flex} -> yes • Number of unique words in ‘yes’: 6 • Number of unique words in ‘no’: 4 • Total # of words: 9
Example • P(cat | yes): 2/6 • P(cat | no): 0/6 • P(yes | {cat, zoom, fly, dog}) = 2/6 * 1/6 * 1/6 * 2/6 = 0.003 • P(no | {cat, zoom, fly, dog}) = e * e * 1/4 * e ~ 0.00025 (epsilon helps us deal with sparse data)
Rule-learning algorithms • If data is structured, rules can be learned for classification • Director=kubrick && star=mcdowell -> like • Title=“police academy*” -> not like • These rules can be stored efficiently as a decision tree • Tests at each node. • Fast, easy to learn, can handle noise
Decision Trees Title=Police Academy yes no Not like Director=kubrick yes no Star=mcdowell … yes no like …
Other model-learning approaches • TFIDF • Produces similar results to Naïve Bayes • Neural Net • Learns a nonlinear function mapping features to classes. • More powerful, but results can be hard to interpret.
Comparing users to users • Often, it’s easier to compare users to other users. • Less data needed • No knowledge of items required. • Typical approach involves nearest-neighbor classification.
Nearest-neighbor classification • We create a feature vector for each user containing an element for each ratable item. • To compare two users, we compute the Euclidean distance between the ‘filled-in’ elements of their feature vectors. • Sqrt(Si(|uji – uki)2) • To recommend, find a similar user, then find things that user rated highly.
Example • Say our domain consists of four movies: • Police Academy • Clockwork Orange • Lord of the Rings • Titanic • We represent this as a four-tuple: • <r1, r2, r3, r4>
Example • We currently have three users in the system • u1: <10, 3, 9, -> • u2: <-, 9, 6, 2> • u3: <1, 7, -, 3> • A new user u4, comes in. • <9, -,-,-> • Most similar to u1, so we would recommend they see Lord of the Rings and avoid Clockwork Orange
Personal and Ethical Issues • How to get users to reveal their preferences? • How to get users to rate all products equally (not just ones they love or hate) • Users may be reluctant to give away personal data. • Users may be upset by “preferential” treatment.
Summary • Recommender systems allow online retailers to customize their sites to meet consumer tastes. • Aid browsing, suggest related items. • Personaliztion is one of e-commerce’s advantages compared to brick-and-mortar stores. • Challenges: obtaining and mining data, making intelligent and novel recommendations, ethics. • Can perform comparisons across users or across items. • Trade off data needed versus detail of recommendation.