E N D
1. Implicit feedback: Good may be better than best Steve Lawrence
3. Web Xanadu (1960)
Improved design, fixes all of these limitations
Essentially unused
The web
Widely used
Disadvantages of the improved design
Extra effort imposed on users
Added complexity in the system
Extended development time
e.g., if link consistency is enforced, no longer can anyone make information available simply by putting a file in a specific directory
The web has become very popular in part due to its limitations
Good may be better than best
4. Web vs. Xanadu Ted Nelson
Much credit: hypertext, inspiration for the web, Lotus notes, HyperCard
More to Xanadu not covered here (transclusion, bidirectional links, version management)
According to Nelson:
On both the desktop and world-wide scale, culturally and commercially, we are poorer for these bad tools [the web]
The World Wide Web is precisely what we were trying to prevent
5. CiteSeer CiteSeer
Metadata not required for submission
Specific citation formats not required
More optimal system?
Require manual submission which specifies title, author, etc. (CORR)
Require citations to be submitted in a specific form (Cameron)
CiteSeer is likely to contain more errors
Error rate on articles not processed is 100%
Value of explicit feedback not obtained is 0
Much lower overhead and complexity for users
6. Implicit vs. explicit feedback Explicit feedback
Overhead for the user
Implicit feedback
No overhead for the user
Implicit feedback may be better than explicit feedback because you may not be able to get sufficient explicit feedback
Other issues - accuracy of feedback
7. Good may be better than best Not a binary choice
Often many possible systems
Also
Worse is better
Best is the worst enemy of good
MIT approach vs. New Jersey approach for design (Gabriel)
The increased overhead, complexity and/or cost (for the system and/or the users), and extended development times of more optimal systems may make them far less successful than alternatives
8. Convenience of access 119,924 conference articles (bibliographical data from DBLP)
9. Explicit metadata usage Only 34% of sites use description or keywords tags on their homepage
Analyzed 2,500 random servers
0.3% of sites contained Dublin Core tags
Attention is the scarce resource. Herb Simon (1967)
Difficult to obtain explicit feedback
10. Implicit vs. explicit feedback Limitations of implicit feedback
Hard to determine the meaning of a click. If the best link is not displayed, users will still click on something
Click duration may be misleading
People leave machines unattended
Opening multiple windows quickly, then reading them all slowly
Multitasking
Limitations of explicit feedback
Spam
Inconsistent ratings
11. CiteSeer
12. CiteSeer Scientific literature digital library
Over 600,000 documents indexed
Earths largest free full-text index of scientific literature
(Los Alamos arXiv about 200,000 papers)
Over 20,000 hosts accessing the site daily
Accesses from over 150 countries per month
Over 10 requests per second at peak times
13. Improving implicit feedback Have to go to details page before getting link to article
Have seen abstract before downloading
Shown context of citations before downloading
14. No download link
15. Document information page
16. Citation context
17. CiteSeer: explicit feedback Document ratings and comments
18. CiteSeer: explicit feedback Allow users to correct errors
Authors may be motivated to correct errors relating to their own work
How many explicit corrections? (About 600,000 papers)
How many explicit ratings? (percentage of document accesses)
19. Explicit feedback Over 300,000 explicit corrections/updates
How many bogus updates?
(We require a validated email address)
Explicit ratings: 0.17% of document accesses
20. Explicit corrections Over 100 bogus correction attempts
21. Comparison of feedback types How well do document access, document downloads, and explicit ratings predict high-citation papers?
Low citation papers (<= 5 citations)
High citation papers (> 5 citations)
Ratio of downloads/accesses/ratings for high to low-citation papers
Accesses ?
Downloads ?
Ratings ?
22. Comparison of feedback types Low citation papers (<= 5 citations)
High citation papers (> 5 citations)
Ratio of downloads/accesses/ratings for high to low-citation papers
Accesses 2.5
Downloads 3.1
Ratings 0.96 (low 2.3 high 2.2)
23. CiteSeer: user profiling Profiling system not currently active (scale)
Profile contains documents, citations, keywords, etc. of interest
User notified of new related documents or citations by email or via the web interface
Both implicit and explicit feedback
Record the actions of a user for recommendations
View
Download
Ignore
26. CiteSeer: user profiling Implicit feedback should be more successful in CiteSeer due to citation context, query-sensitive summaries, document details pages, and the expense of document downloads
Users can better determine the relevance of documents before they request details or download articles
Analyze co-viewed/downloaded documents to recommend documents related to a given document
Similar to one of Amazons book recommenders
27. Profile creation (Pseudo)-documents added to users profile whenever a user performs an action in the profile editor or on a real document when browsing
Action interestingness a(.)
Explicitly added to profile Very high positive
Downloaded High positive
Details viewed Moderate positive
Recommendation ignored Low negative
Removed from profile Set to zero
28. Paper recommendations New papers recommended periodically via email or the web interface
New paper d* recommended if it has a sufficiently high interestingness
Threshold initially set at a small positive value
29. Profile adaption Adaption occurs via manual adjustment and machine learning
User can explicitly modify a profile by adjusting the weight of pseudo-documents
Browsing actions implicitly modify the weight of corresponding pseudo-documents
User response to recommendation of a paper d* is used to update weights that contributed to the recommendation
where is the learning rate
30. Weight update rule properties Weights modified according to their contribution to recommendations
Overall precision/recall threshold automatically adapted. Ignoring recommendations raises the threshold for recommending a paper. Explicitly adding papers lowers the threshold
The influence of different relatedness measures is adapted separately
31. REFEREE Recommender framework where outside groups can test recommendation systems live on CiteSeer
Implemented a version of Pennocks Personality Diagnosis recommender for initial testing
32. REFEREE Statistics on recommender performance available quickly
For evaluation we focus on measuring impact on user behavior
Implicit feedback more effective because users see a lot of information about documents before they can download them
Which recommenders best?
Users who viewed x also viewed?
Exact sentence overlap?
Papers that cite this paper?
Citation similarity?
33. Recommendations followed
34. NewsSeer
35. NewsSeer Primarily a single page with implicit feedback only
Also supports explicit feedback but this is optional
39. NewsSeer statistics About 1 million pageviews
About 10,000 users (>= 5 requests)
5,000 users (>= 10 requests)
How many users rated an article?
What percentage of requests were ratings on the homepage?
What percentage of requests were for the source ratings page?
40. NewsSeer statistics 1,000 users rated an article from the 10,000 with >= 5 requests
About 10%
About 20% of the top 2,500 users
About 30% of the top 1,000 users
20 of 56 users that did >1,000 requests
10 of 21 users that did >2,000 requests
Homepage 51% (auto-reloaded)
View article 40%
Keyword query 4% (was not available initially)
Ratings on homepage 5%
Source rating page views 0.2%
41. MusicSeer
42. Music similarity
43. Music similarity Music similarity survey
Erdös game
44. Music similarity
46. MusicSeer Survey
713 users, 10,997 judgments
Game
680 users, 11,313 judgments
47. Summary Implicit feedback may be better because there is much lower overhead
Much greater participation may more than compensate for the less accurate information received
Can structure system to maximize implicit feedback gained
Can obtain explicit feedback if enough incentive, or easy enough