220 likes | 566 Views
Distributed, Real-Time Computation of Community Preferences. Thomas Lutkenhouse, Michael L. Nelson, Johan Bollen Old Dominion University Computer Science Department Norfolk, VA 23529 USA {lutken,mln,jbollen}@cs.odu.edu HT 2005 - Sixteenth ACM Conference on Hypertext and Hypermedia
E N D
Distributed, Real-Time Computation of Community Preferences Thomas Lutkenhouse, Michael L. Nelson, Johan Bollen Old Dominion UniversityComputer Science DepartmentNorfolk, VA 23529 USA {lutken,mln,jbollen}@cs.odu.edu HT 2005 - Sixteenth ACM Conference on Hypertext and Hypermedia 6.-9.Sept. 2005, Salzburg Austria
not CS if you don’t compute changes are immediate no central state not personalization Distributed, Real-Time Computation of Community Preferences
Outline • Review of technologies • buckets • Hebbian learning • previous results • Experiment design • Results • Lessons learned • Conclusions
SRW RSS !? Non-evolution of DL Objects . . .
Buckets • Premise: repositories come and go, but the objects should endure • Began as part of NASA DL research • focus on digital preservation • implementation of the “Smart Objects, Dumb Archives” (SODA) model for digital libraries • CACM 2001, doi.acm.org/10.1145/374308.374342 • D-Lib, dx.doi.org/10.1045/february2001-nelson
Smart Objects • Responsibilities generally associated with the repository are “pushed down” into the stored object • T&C, maintenance, logging, pagination & display, etc… • Aggregate: • metadata • data • methods to operate on the metadata/data • API examples • http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=getMetadata&type=all • http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=listMethods • http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=listPreference • (cheat) http://www.cs.odu.edu/~mln/teaching/cs595-f03/bucket/bucket.xml
Examples • 1.6.X bucket • http://ntrs.nasa.gov/ • http://www.cs.odu.edu/~mln/phd/ • 2.0 buckets • http://www.cs.odu.edu/~mln/teaching/cs595-f03/ • http://www.cs.odu.edu/~lutken/bucket/ • 3.0 buckets (under development) • http://beaufort.cs.odu.edu:8080/ • uses MPEG-21 DIDLs • cf. http://www.dlib.org/dlib/november03/bekaert/11bekaert.html
Hebbian Learning Implementation issues: - gather log files - problematic when spread across servers/domains - determine a T for session reconstruction - typically 5 min - compute links & weights - update the network periodically - typically monthly
Previous, Log-Based Recommendation Implementations • LANL Journal Recommendations • collection analysis based on journal readership patterns • D-Lib Magazine, dx.doi.org/10.1045/june2002-bollen • NASA Technical Report Server • compared recommendations with those generated by VSM • WIDM 2004, doi.org.acm/1031453.1031480 • Open Video Project • generated recommendations for videos (little descriptive metadata) • JCDL 2005, doi.acm.org/1065385.1065472
Hebbian Learning with Bucket Methods http://b?method=display &referer=http://b& redirect=http://a?method=display %26redirect=http://c?method=display %26referer=http://b http://a?method=display &referer=http://a& redirect=http://b?method=display %26referer=http://a
Experiment • Spin Magazine’s “Top 50 Rock Bands of All Time” • something other than reports, journals, etc. • harvest allmusic.com for metadata for all LPs by the 50 bands (total = 800 LPs) • Maintain hierarchical arrangement • 1 artist N albums • Initialize the network of 800 LPs with each LP randomly linked to 5 other LPs • Send out email invitations to browse the network • have them explore, and then examine the resulting network • users not informed about the workings of the network
-<structural> -<element wt="0.5" id="~http://www.cs.odu.edu/~lutken/bucket/121/"> -<metadata> -<descriptive> <title>Terrapin Station, Capital Centre, Landover, MD, 3/15/90</title> </descriptive> <administrative/> </metadata> </element> -<element wt="0.5" id="~http://www.cs.odu.edu/~lutken/bucket/11/"> -<metadata> -<descriptive> <title>Jealousy/Progress</title> </descriptive> <administrative/> </metadata> </element> -<element wt="3" id="~http://www.cs.odu.edu/~lutken/bucket/434/"> -<metadata> -<descriptive> <title>Nevermind</title> </descriptive> <administrative/> </metadata> </element> -<element wt="0.5" id="~http://www.cs.odu.edu/~lutken/bucket/130/"> -<metadata> -<descriptive> <title>Technical Ecstasy</title> </descriptive> <administrative/> </metadata> </element> ……. Hierarchical, Weighted Links weights - initial: 0.5 - frequency : 1.0 - symmetry: 0.5 - transitivity: 0.3
Respondents • August 2004 - October 2004 • 160 respondents • self-identify at the beginning; exit survey at the end • 1200 bucket-to-bucket traversals (7.5 average traversals per session)
How to Evaluate the Resulting Network? • Compute network analysis metrics: • PageRank • Degree Centrality • Weighted Degree Centrality • Compare the results to: • Other “expert” lists (VH1, DigitalDreamDoor, original Spin Magazine list) • Artist / LP best seller according to RIAA • Artist / LP Amazon sales rank
Expert Rankings • No correlation with: • VH1 artist list • DigitalDreamDoor list • original Spin Magazine list (!) (critics don’t agree with each other, or the record buying public)
RIAA Results • RIAA had only • only 51/800 LPs • only 14/50 artists (critics don’t buy records!) *RIAA sales caveat • Figure 6. Probability of albums being best-sellers. Figure 7. Probability of artists being best-sellers.
Amazon Sales Rank • No correlation with individual LP sales rank… • …but correlated with mean artist sales rank • similar to RIAA data • interpretation: popular artists often have obscure LPs
Lessons Learned • While the subject matter was interesting, it was oriented for music geeks • i.e., no actual music was delivered to the users (intellectual property considerations) • more traversals needed • Random initial starting points were difficult to overcome • “cold start problem” - pre-seed the links according to some criteria? • weights did not decay over time/traversals • Choosing only artists from Spin Magazine may have pre-filtered the response • choose artists from Down Beat (Jazz), Vibe (Urban), Music City News (Country), etc.
Conclusions • Can build a network of smart objects featuring adaptive, hierarchical links constructed in real-time without central state • network is created without latency and with computations amortized over individual accesses • Experimental testbed with popular music LP metadata shown to approach sales rank of artists, not LPs