1 / 22

User Evaluation of the NASA Technical Report Server Recommendation Service

User Evaluation of the NASA Technical Report Server Recommendation Service . Michael L. Nelson, Johan Bollen Old Dominion University {mln,jbollen}@cs.odu.edu JoAnne R. Calhoun, Calvin E. Mackey NASA Langley Research Center {joanne.r.calhoun,calvin.e.mackey}@nasa.gov. Outline. OAI-PMH

anitra
Download Presentation

User Evaluation of the NASA Technical Report Server Recommendation Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User Evaluation of the NASA Technical Report Server Recommendation Service Michael L. Nelson, Johan Bollen Old Dominion University {mln,jbollen}@cs.odu.edu JoAnne R. Calhoun, Calvin E. Mackey NASA Langley Research Center {joanne.r.calhoun,calvin.e.mackey}@nasa.gov

  2. Outline • OAI-PMH • NASA Technical Report Server (NTRS) • Experimental Methodology • Results • Future Work & Conclusions

  3. service providers (harvesters) data providers (repositories) OAI-PMH Metadata Harvesting Model

  4. Aggregators • aggregators allow for: • scalability for OAI-PMH • load balancing • community building • discovery service providers (harvesters) data providers (repositories) aggregator

  5. NTRS • OAI-PMH aggregator • OAI-PMH baseURL & humans: http://ntrs.nasa.gov/ • Technology • MySQL 4.0.12 • Va Tech OAI-PMH harvester • http://oai.dlib.vt.edu/odl/software/harvest/ • Buckets 1.6.3 • Coverage • 837,000+ abstracts • approaching a similar number of full-text • 17 different repositories • 13 NASA, 4 non-NASA • in use since 1995 • since 2003 as an OAI-PMH service provider

  6. NTRS Contents

  7. http://ntrs.nasa.gov/?method=display&redirect=http://techreports.larc.nasa.gov/ltrs/PDF/2000/aiaa/NASA-aiaa-2000-4886.pdf&http://ntrs.nasa.gov/?method=display&redirect=http://techreports.larc.nasa.gov/ltrs/PDF/2000/aiaa/NASA-aiaa-2000-4886.pdf& oaiID=oai:ltrs.larc.nasa.gov:NASA-aiaa-2000-4886 NTRS Search Results

  8. Current Recommendation Generation Method • Based on prior work @ LANL • intended for large-scale (10s M) applications • Identify co-retrieval events from web logs • If 2 articles are successively downloaded within time t, increment the weight of co-retrieval • t = 1 hour • Recommendations reflect a community’s preferences; not an individual’s • The more a file is downloaded, the stronger the recommendations for that file • corollary: no downloads, no recommendations

  9. U.S. Government & Web Privacy Policy • Children's Online Privacy Protection Act of 1998 • http://www.ftc.gov/ogc/coppa1.htm • Office of Management & Budget • http://www.whitehouse.gov/omb/memoranda/m99-18.html • http://www.whitehouse.gov/omb/memoranda/m00-13.html • NASA • http://www.nasa.gov/about/highlights/HP_Privacy.html • DOJ • http://www.usdoj.gov/04foia/privstat.htm B) but does not include-- (i) matches performed to produce aggregate statistical data without any personal identifiers;

  10. 1. harvest log files from NTRS 3. NTRS requests recommendations for an OAI id 4. recommender responds with 10 OAI ids Recommender Architecture recommender.cs.odu.edu ntrs.nasa.gov 2. compute recommendation matrix NTRS does a local lookup on the ids & displays results

  11. How Effective is the Log Analysis Method? • Anecdotally, we knew that the recommendations were well received • but are they better than recommendations generated by another method? • Goal: compare the perceived quality of recommendations generated by: • log analysis (Method A) • vector space model (Method B)

  12. Call for Volunteers • Announcements made in LaRC intranet, mailing lists, etc. • Four 90 minute sessions held on base in a separate training facility • Bribed with donuts & soft drinks

  13. Methodology • Pick 10 papers from the LaRC collection that have recommendations in the log analysis • Create VSM-based recommendations for all LaRC papers • ~ 4100 LaRC papers • Instructions to volunteers • for each of the 10 documents • read the abstract • score “good” evaluations generated by log analysis • score “good” evaluations generated by VSM • search for their own papers, or papers they know well • scored evaluations for log analysis & VSM

  14. Guidance for Judging Relevance • Volunteers were encouraged to consider: • similarity: documents are obviously textually related • serendipity: documents are related in a way that you did not anticipate • contrast: documents show competing / alternate approaches, methodology, etc. • relation: documents by the same author, from the same conference series, etc.

  15. User Evaluation Session

  16. Results • Result set • 129 comparisons • 29 documents • 13 volunteers • ANOVA • null hypothesis rejected at p<0.1; means are marginally different • Spearman correlation coefficients • a weak negative correlation between rater knowledge and log analysis ratings (-0.156) • no correlation between rater knowledge & VSM (0.12931) • positive, significant relationship between log analysis & VSM ratings (0.20100) • some documents produce better relationships than others

  17. Rating Distribution A=log analysis B=VSM

  18. We Did Not Find What We Hoped… • Possible methodological shortcomings • we chose the documents randomly; we did not choose the most “mature” documents from the collection • positive, significant correlation between number of downloads and preference for the log analysis method (0.201) • negative, significant correlation between number of downloads and preference for VSM method (-0.32) • positive significant correlation (0.384) between A/B and # of downloads • paradox: the best qualified raters are the least likely to show up…

  19. Document / Volunteer Mismatch? raters not expert in the documents raters’ own publications outside of the NTRS core Titles from the NASA STI Subject Categories, http://www.sti.nasa.gov/subjcat.pdf Organization names ca. March 2004

  20. Robots in the Mist? robot noise?

  21. Future Work • More frequent harvesting of logs for more up-to-date recommendations • currently monthly granularity • Minimize robot impact on the logs / recommendations • Seed log analysis recommendations with VSM results • recommendations converge & mature more quickly • Re-run the experiment with: • more mature documents • more subjects • aerospace engineering graduate students? • pay them $$$

  22. Conclusions • Slightly disappointing results • VSM preferred over log analysis • …but, VSM had the deck stacked in its favor: • significant mismatch between volunteer expertise & article subject • articles randomly chosen from the LaRC collection • most mature articles not chosen; evidence that log analysis improves with download frequency • Next steps: • scrub logs more to remove robots, other spurious data sources • mix VSM & log analysis • find a larger, more captive audience

More Related