290 likes | 415 Views
Improving Recommendation Lists Through Topic Diversification. CaiNicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen WWW '05 報告人 : 謝順宏. Outline. Introduction On collaborative filtering Evaluation metrics Topic diversification Empirical analysis Related work Conclusion.
E N D
Improving Recommendation Lists ThroughTopic Diversification CaiNicolas Ziegler, Sean M. McNee,Joseph A. Konstan, Georg Lausen WWW '05 報告人:謝順宏
Outline • Introduction • On collaborative filtering • Evaluation metrics • Topic diversification • Empirical analysis • Related work • Conclusion
Introduction • To reflect the user’s complete spectrum of interests. • Improves user satisfaction. • Many recommendations seem to be “similar” with respect to content. • Traditionally, recommender system projects have focused on optimizing accuracy using metrics such as precision/recall or mean absolute error.
Introduction • Topic diversification • Intra-list similarity metric. • Accuracy versus satisfaction. • “accuracy does not tell the whole story”
On collaborative filtering(CF) • Collaborative filtering (CF) still represents the most commonly adopted technique in crafting academic and commercial recommender systems. • Its basic idea refers to making recommendations based upon ratings that users have assigned to products.
User-based Collaborative Filtering • a set of users • a set of products • partial rating function for each user,
User-based Collaborative Filtering Two major steps: • Neighborhood formation. • Pearson correlation • Cosine distance • Rating prediction
Itembased Collaborative Filtering • Unlike user-based CF, similarity values c are computed for items rather than users.
Evaluation metrics • Accuracy Metrics • Predictive Accuracy Metrics • Decision Support Metrics • Beyond Accuracy • Coverage • Novelty and Serendipity • Intra-List Similarity
Accuracy Metrics • Predictive Accuracy Metrics • Mean absolute error (MAE) • Mean squared error(MSE) • Decision Support Metrics • Recall • Precision
Beyond Accuracy • Coverage • Coverage measures the percentage of elements part of the problem domain for which predictions can be made. • Novelty and Serendipity • Novelty and serendipity metrics thus measure the “non-obviousness” of recommendations made, avoiding “cherry-picking”.
Intra List Similarity(ILS) • To measure the similarity between product
Topic Diversification “Law of Diminishing Marginal Returns” • Suppose you are offered your favorite drink. Let p1 denote the price you are willing to pay for that product. Assuming your are offered a second glass of that particular drink, the amount p2 of money you are inclined to spend will be lower, i.e., p1 > p2. Same for p3, p4, and so forth.
Topic Diversification • Taxonomy-based similarity metric • To compute the similarity between product sets based upon their classification.
Topic Diversification • Topic Diversification Algorithm • Re-ranking the recommendation list from applying topic diversification.
Topic Diversification • ΘF defines the impact that dissimilarity rank exerts on the eventual overall output. • Large ΘF favors diversification over a’s original relevance order. • The input lists muse be considerably larger than the final top-N list.
Recommendation dependency • We assume that recommended products along with their content descriptions, only relevance weight ordering must hold for recommendation list items, no other dependencies are assumed. • An item b’s current dissimilarity rank with respect to preceding recommendations plays an important role and may influence the new ranking.
Empirical analysis • Dataset • BookCrossing (http://www.bookcrossing.com) • 278,858 members • 1,157,112 ratings • 271,379 distinct ISBN
Data clean & condensation • Discarded all books missing taxonomic descriptions. • Only community members with at least 5 ratings each were kept. • 10339 users • 6708books • 316349 ratings
Evaluation Framework Setup • Did not compute MAE metric values • Adopted K-folding (K=4) • We were interested in seeing how accuracy, captured by precision and recall, behaves whe increasing θF.
Empirical analysis • ΘF=0,
Conclusion • We found that diversification appears detrimental to both user-based and item-based CF along precision and recall metrics. • Item-based CF seems more susceptible to topic diversification than user-based CF, backed by result from precision, recall and ILS metric analysis.
Conclusion • Diversification factor impact • Human perception • Interaction with accuracy
Related work • Northern Light (http://www.northernlight.com) • Google (http://www.google.com)
Conclusion • An algorithmic framework to increase the diversity of a top-N list of recommended products. • New intra-list similarity metric.