170 likes | 485 Views
Collaborative Filtering. - Pooja Hegde. The Problem : OVERLOAD. Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!. Solution : Recommender Systems. Recommender systems is a personalized information filtering technology
E N D
Collaborative Filtering - Pooja Hegde
The Problem : OVERLOAD • Too much stuff!!!! • Too many books! • Too many journals! • Too many movies! • Too much content!
Solution : Recommender Systems • Recommender systemsis a personalized information filtering technology • They decide if a user will like a certain item ( prediction problem) or identifies a set of N items that will be of interest to a certain user ( top N – recommendation problem ). • They have become very powerful tools in domains such as e-commerce, digital librairies and knowledge management.
Collaborative Filtering (CF) • It is the most successful and widely used technique for building recommender systems. • Builds a database of preferences of products by consumers. • A new customer, Neo, is matched against the database to discover neighbors. • Products that the neighbors like are then recommended to Neo as he will probably also like them.
Types of CF algorithms Model – based CF algorithms : • Develop a description model from the database and use it to make predictions for an active user. • The model building process is done by different ML algorithms such as Bayesian network, clustering and rule – based approach. Memory – based CF algorithms : • Use statistical techniques to find a subset of users ( neighbors ) for each active user. • Different algorithms are then used to combine the preferences of the neighbors to produce a prediction or top-n recommendation for active user.
k-Nearest Neighbor CF algorithm • The k-nearest neighbor oruser-based collaborative filteringis the most successful technology for building recommender systems • It is extensively used in many commercial recommender systems. • It is known to be superior to model-based algorithms in terms of accuracy & it recommends items outside the users usual content range • Since it operates over the entire user database to make predictions this system rapidly incorporates the most upto date information.
Limitations of k-Nearest Neighbor CF systems Sparsity and Early Rater Problem Sparsity • Limited amount of historical information for each user and for each item. • CF-based recommender systems cannot accurately compute the neighborhood and identify the items to recommend–leading to poor recommendations. Early Rater ( new user ) problem • When new users come along the system knows nothing about them The system must acquire some information about the new user in order to make personalized predictions.
Sparsity and Early Rater ProblemProposed Solutions • To address sparsity problem, a variety of statistical techniques such as general clustering, singular value decomposition, factor analysis have been examined for compressing dimensionality of the database. • Hybrid approach ( incorporates non-CF techniques into CF system) Incorporates semi-intelligent content-based filtering agents, called filterbots into a ratings-based CF system. • They rate every item in the system according to their algorithmic analysis of the content of the item, thereby increasing the density of the datasets. • Hence they make sure that every item in the system has many ratings to help users find items they are most interested in .
Limitations of k-Nearest Neighbor CF systems Scalability and Quality of Recommendations Problem Scalability • The bottleneck in k-NN CF algorithm is the search for neighbors among a large user population of potential neighbors. With millions of users and items, existing CF- based recommender systems suffer serious scalability problems. Quality of Recommendations • Consumers need recommendations they can trust to help them find products they will like. • The most important errors to avoid are false positives, ( products that are recommended, though the consumer does not like them ) since these errors will lead to angry consumers. • In some ways these 2 challenges are in conflict since the less time an algorithm spends searching for neighbors the more scalable it will be and the worse its quality.
Scalability & Quality of Recommendations ProblemProposed Solutions • One way of reducing the complexity of computations is to cluster the users and then to either limit the nearest-neighbor search among the users that belong to the nearest cluster or use the cluster centroids to derive the recommendations • These approaches, even though they can significantly speed up the recommendation engine, they tend to decrease the quality of the recommendations. • An alternate approach is to build recommendation models that explore the relationship between items first rather than between users. Recommendations are computed by finding items that are similar to other items the user has liked. • This class of approaches are known as item-based recommendation algorithms. • They have been shown to produce recommendation results that are comparable to traditional, neighborhood-based CF recommender systems but require less online computation because the relationship between items are relatively static.
Conclusions • The explosive growth of the world-wide-web and the emergence of e-commerce has led to the development of recommender systems. • These systems make use of relationships found between people, an approach that fits nicely with the idea that humans are fundamentally social creatures • One successful recommender system technology is collaborative filtering. CF systems especially k-Nearest Neighbor, are achieving widespread success on the web. • Basic idea of k-NN CF systems is to suggest new items or predict the utility of a certain item for a particular user based on the user’s previous liking and opinions of other like minded users.
Conclusions • Despite the popularity of user-based CF recommender systems, they have a number of limitations related to scalability,quality of recommendations, sparsity and early-rater problem • New recommender system technologies have been developed that quickly produce high quality recommendations, even for very largescale problems and address the sparsity and startup problem.