190 likes | 324 Views
Recommenders for Information Seeking Tasks: Lessons Learned. Michael Yudelson. References. Joseph A. Konstan, Sean M. McNee, Cai-Nicolas Ziegler, Roberto Torres, Nishikant Kapoor, John Riedl: Lessons on Applying Automated Recommender Systems to Information-Seeking Tasks . AAAI 2006
E N D
Recommenders for Information Seeking Tasks: Lessons Learned Michael Yudelson
References • Joseph A. Konstan, Sean M. McNee, Cai-Nicolas Ziegler, Roberto Torres, Nishikant Kapoor, John Riedl: Lessons on Applying Automated Recommender Systems to Information-Seeking Tasks. AAAI 2006 • McNee, S. M., Kapoor, N., and Konstan, J. A. 2006. Don't look stupid: avoiding pitfalls when recommending research papers. In Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work (Banff, Alberta, Canada, November 04 - 08, 2006). CSCW '06. ACM Press, New York, NY, 171-180. • Michael Yudelson, AAAI 2006 Nectar Session Notes
Overview • Statement of the Problem • Theories • General Advice • Experiment • Lessons Learned
“There is an emerging understanding that good recommendation accuracy alone does not give users of recommender systems an effective and satisfying experience. Recommender systems must provide not just accuracy, but also usefulness.” J.L. Herlocker, J.A. Konstan, L.G. Terveen, and J.T. Riedl, "Evaluating Collaborative Filtering Recommender Systems", ACM Trans.Inf.Syst., vol. 22(1), pp. 5-53, 2004.
Statement of the Problem • User is engaged in an information seeking task (or several) • Movies, Papers, News • Goal of the recommender is to meet user specific needs with respect to • Correctness • Saliency • Trust • Expectations • Usefulness
Theories • Information Retrieval (IR) • Machine Learning (ML) • Human-Recommender Interaction (HRI) • Information Seeking Theories • Four Stages of Information Need (Taylor) • Mechanisms and Motivations Model (Wilson) • Theory of Sense Making (Dervin) • Information Search Process (Kuhlthau)
General Advice • Support multiple information seeking tasks • User-centered design • Shift focus from system and algorithm to potentially repeated interactions of a user with a system • Recommend • Not what is “relevant”, • But what is “relevant for info seeking task X”
General Advice (cont’d) • Choice of the recommender algorithm • Saliency (the emotional reaction a user has to a recommendation) • Spread (the diversity of items) • Adaptability (how a recommender changes as a user changes) • Risk (recommending items based on confidence)
What Can Go Wrong • Possible pitfalls (semantic) • not building user confidence (trust failure) • not generating any recommendations (knowledge failure) • generating incorrect recommendations (personalization failure), and • generating recommendations to meet the wrong need (context failure)
Experiment • Domain - Digital Libraries (ACM) • Information Seeking Tasks • Find references to fit a document • Maintain awareness in a research field • Subjects - 138 • 18 students, 117 professors/researchers, 7 non-computer scientists
Experiment (cont’d) • Tested recommending algorithms • User-Based Collaborative Filtering (CF) • Naïve Bayesian Classifier (Bayes) • Probabilistic Latent Semantic Indexing (PLSI) • Textual TF/IDF-based algorithm (TFIDF)
Experiment (cont’d) • Walkthrough • Seed the document selection (self or others) • Tasks (given seeded documents ) • What are other relevant papers in the DL interesting to read • What are the papers that would extend the coverage of the field • Compare recommendations of 2 algorithms (each recommends 5 items) • Satisfaction with algorithm A or B on likert scale • Preference of algorithm A or B
Experiment (cont’d) • Anticipated Results • CF - golden standard • PLSI - comparable with CF • Bayes - generating more mainstream recommendations, worse personalization • TFIDF - more conservative, yet coherent results • CF + PLSI vs. Bayes + TFIDF
Experiment (cont’d) • Results • Dimensions • Authoritative Work, Familiarity, Personalization • Good Recommendation, Expected, Good Spread • Suitability for Current Task • CF + TFIDF significantly better feedback that Bayes + PLSI • No significant difference between CF & TFIDF or Bayes & PLSI • Contradicts IR & ML literature
Experiment (cont’d) • What went wrong • Bayes - generated similar recommendations for all users • PLSI - random, “illogical” recommendation • Both Bayes and PLSI • Highly dependant on “connectivity” (co-citation) of papers • Suffered from inconsistency • Didn’t “fail”, but were “inadequate”
Lessons Learned • Understanding the task is more important than achieving high relevancy of recommendation for that task • Understanding whether searcher knows what s/he’s looking for is crucial • There is no “golden bullet” • People think of recommenders as machine learning systems • modeling what you already know, predicting the past and penalizing for predicting the future
Lessons Learned (cont’d) • Dependence on offline experiments created a disconnect between algorithms that score well on accuracy metrics and algorithms that prove useful for users • Problem of Ecological Validity
Lessons Learned (cont’d) • 1 good recommendation in a list of 5 • Wins trust of the user • 1 good recommendation in a list of 5 • Loses trust of user • If user needs are unclear • Do a user study to elicit them
Thank you! • Questions…