230 likes | 369 Views
User Evaluation of the NASA Technical Report Server Recommendation Service . Michael L. Nelson, Johan Bollen Old Dominion University {mln,jbollen}@cs.odu.edu JoAnne R. Calhoun, Calvin E. Mackey NASA Langley Research Center {joanne.r.calhoun,calvin.e.mackey}@nasa.gov. Outline. OAI-PMH
E N D
User Evaluation of the NASA Technical Report Server Recommendation Service Michael L. Nelson, Johan Bollen Old Dominion University {mln,jbollen}@cs.odu.edu JoAnne R. Calhoun, Calvin E. Mackey NASA Langley Research Center {joanne.r.calhoun,calvin.e.mackey}@nasa.gov
Outline • OAI-PMH • NASA Technical Report Server (NTRS) • Experimental Methodology • Results • Future Work & Conclusions
service providers (harvesters) data providers (repositories) OAI-PMH Metadata Harvesting Model
Aggregators • aggregators allow for: • scalability for OAI-PMH • load balancing • community building • discovery service providers (harvesters) data providers (repositories) aggregator
NTRS • OAI-PMH aggregator • OAI-PMH baseURL & humans: http://ntrs.nasa.gov/ • Technology • MySQL 4.0.12 • Va Tech OAI-PMH harvester • http://oai.dlib.vt.edu/odl/software/harvest/ • Buckets 1.6.3 • Coverage • 837,000+ abstracts • approaching a similar number of full-text • 17 different repositories • 13 NASA, 4 non-NASA • in use since 1995 • since 2003 as an OAI-PMH service provider
http://ntrs.nasa.gov/?method=display&redirect=http://techreports.larc.nasa.gov/ltrs/PDF/2000/aiaa/NASA-aiaa-2000-4886.pdf&http://ntrs.nasa.gov/?method=display&redirect=http://techreports.larc.nasa.gov/ltrs/PDF/2000/aiaa/NASA-aiaa-2000-4886.pdf& oaiID=oai:ltrs.larc.nasa.gov:NASA-aiaa-2000-4886 NTRS Search Results
Current Recommendation Generation Method • Based on prior work @ LANL • intended for large-scale (10s M) applications • Identify co-retrieval events from web logs • If 2 articles are successively downloaded within time t, increment the weight of co-retrieval • t = 1 hour • Recommendations reflect a community’s preferences; not an individual’s • The more a file is downloaded, the stronger the recommendations for that file • corollary: no downloads, no recommendations
U.S. Government & Web Privacy Policy • Children's Online Privacy Protection Act of 1998 • http://www.ftc.gov/ogc/coppa1.htm • Office of Management & Budget • http://www.whitehouse.gov/omb/memoranda/m99-18.html • http://www.whitehouse.gov/omb/memoranda/m00-13.html • NASA • http://www.nasa.gov/about/highlights/HP_Privacy.html • DOJ • http://www.usdoj.gov/04foia/privstat.htm B) but does not include-- (i) matches performed to produce aggregate statistical data without any personal identifiers;
1. harvest log files from NTRS 3. NTRS requests recommendations for an OAI id 4. recommender responds with 10 OAI ids Recommender Architecture recommender.cs.odu.edu ntrs.nasa.gov 2. compute recommendation matrix NTRS does a local lookup on the ids & displays results
How Effective is the Log Analysis Method? • Anecdotally, we knew that the recommendations were well received • but are they better than recommendations generated by another method? • Goal: compare the perceived quality of recommendations generated by: • log analysis (Method A) • vector space model (Method B)
Call for Volunteers • Announcements made in LaRC intranet, mailing lists, etc. • Four 90 minute sessions held on base in a separate training facility • Bribed with donuts & soft drinks
Methodology • Pick 10 papers from the LaRC collection that have recommendations in the log analysis • Create VSM-based recommendations for all LaRC papers • ~ 4100 LaRC papers • Instructions to volunteers • for each of the 10 documents • read the abstract • score “good” evaluations generated by log analysis • score “good” evaluations generated by VSM • search for their own papers, or papers they know well • scored evaluations for log analysis & VSM
Guidance for Judging Relevance • Volunteers were encouraged to consider: • similarity: documents are obviously textually related • serendipity: documents are related in a way that you did not anticipate • contrast: documents show competing / alternate approaches, methodology, etc. • relation: documents by the same author, from the same conference series, etc.
Results • Result set • 129 comparisons • 29 documents • 13 volunteers • ANOVA • null hypothesis rejected at p<0.1; means are marginally different • Spearman correlation coefficients • a weak negative correlation between rater knowledge and log analysis ratings (-0.156) • no correlation between rater knowledge & VSM (0.12931) • positive, significant relationship between log analysis & VSM ratings (0.20100) • some documents produce better relationships than others
Rating Distribution A=log analysis B=VSM
We Did Not Find What We Hoped… • Possible methodological shortcomings • we chose the documents randomly; we did not choose the most “mature” documents from the collection • positive, significant correlation between number of downloads and preference for the log analysis method (0.201) • negative, significant correlation between number of downloads and preference for VSM method (-0.32) • positive significant correlation (0.384) between A/B and # of downloads • paradox: the best qualified raters are the least likely to show up…
Document / Volunteer Mismatch? raters not expert in the documents raters’ own publications outside of the NTRS core Titles from the NASA STI Subject Categories, http://www.sti.nasa.gov/subjcat.pdf Organization names ca. March 2004
Robots in the Mist? robot noise?
Future Work • More frequent harvesting of logs for more up-to-date recommendations • currently monthly granularity • Minimize robot impact on the logs / recommendations • Seed log analysis recommendations with VSM results • recommendations converge & mature more quickly • Re-run the experiment with: • more mature documents • more subjects • aerospace engineering graduate students? • pay them $$$
Conclusions • Slightly disappointing results • VSM preferred over log analysis • …but, VSM had the deck stacked in its favor: • significant mismatch between volunteer expertise & article subject • articles randomly chosen from the LaRC collection • most mature articles not chosen; evidence that log analysis improves with download frequency • Next steps: • scrub logs more to remove robots, other spurious data sources • mix VSM & log analysis • find a larger, more captive audience