1 / 30

Evidence from Behavior

Evidence from Behavior. LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007. . . Picture of Relevance Feedback. Initial query. x. x. x. x. o. x. x. x. x. x. x. x. o. x. o. x. x. o. x. o. o. x. x. x. x. Revised query. x non-relevant documents

kael
Download Presentation

Evidence from Behavior

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evidence from Behavior LBSC 796/INFM 719R Douglas W. Oard Session 7, October 22, 2007

  2.  Picture of Relevance Feedback Initial query x x x x o x x x x x x x o x o x x o x o o x x x x Revised query x non-relevant documents o relevant documents

  3. Rocchio Formula qm = modified query vector; q0 = original query vector; α,β,γ: weights (hand-chosen or set empirically); Dr = set of known relevant doc vectors; Dnr = set of known irrelevant doc vectors

  4. Rocchio Example Typically,  <  0 4 0 8 0 0 0 4 0 8 0 0 Original query (+) 2 4 8 0 0 2 1 2 4 0 0 1 Positive Feedback (-) Negative feedback 8 0 4 4 0 16 2 0 1 1 0 4 New query -1 6 3 7 0 -3

  5. Using Positive Information Source: Jon Herlocker, SIGIR 1999

  6. Using Negative Information Source: Jon Herlocker, SIGIR 1999

  7. Use ratings as to describe objects Personal recommendations, peer review, … Beyond topicality: Accuracy, coherence, depth, novelty, style, … Has been applied to many modalities Books, Usenet news, movies, music, jokes, beer, … Rating-Based Recommendation

  8. Motivations to Provide Ratings • Self-interest • Use the ratings to improve system’s user model • Economic benefit • If a market for ratings is created • Altruism

  9. Explicit Feedback: Assumptions • A1: User has sufficient knowledge for a reasonable initial query • A2: Selected examples are representative • A3: The user will try giving feedback • A4: The user will keep giving feedback

  10. A1: Good Initial Query? • Two problems: • User may not have sufficient initial knowledge • Few or no relevant documents may be retrieved • Examples: • Misspellings (Brittany Speers) • Cross-language information retrieval • Vocabulary mismatch (e.g., cosmonaut/astronaut)

  11. A2: Representative Examples? • There may be several clusters of relevant documents • Examples: • Burma/Myanmar • Contradictory government policies • Opinions

  12. A3: Will People Use It? • Efficiency • Longer queries require more processing time • Understandability • Harder to see why subsequent documents retrieved • Risk • Users are reluctant to provide negative feedback

  13. A4: Self-Interest Decreases! Marginal value to community Value of ratings Marginal cost Marginal value to rater Few Lots Number of Ratings

  14. Solving the Cost vs. Value Problem • Maximize the value • Provide for continuous user model adaptation • Minimize the costs • Use implicit feedback rather than explicit ratings • Minimize privacy concerns through encryption • Build an efficient scalable architecture • Limit the scope to noncompetitive activities

  15. Solution: Reduce the Marginal Cost Marginal value to community Marginal cost Marginal value to rater Few Lots Number of Ratings

  16. View Select Listen Print Bookmark Save Purchase Subscribe Delete Copy / paste Forward Quote Reply Link Cite Mark up Tag Organize Publish Type Edit

  17. Examine Retain Behavior Category Reference Annotate Create View Select Listen Print Bookmark Save Purchase Subscribe Delete Copy / paste Forward Quote Reply Link Cite Mark up Tag Organize Publish Type Edit

  18. Minimum Scope Segment Object Class Examine Retain Behavior Category Reference Annotate Create View Select Listen Print Bookmark Save Purchase Subscribe Delete Copy / paste Forward Quote Reply Link Cite Mark up Tag Organize Publish Type Edit

  19. Estimating Authority from Links Hub Authority Authority

  20. Armonk, NY-based computer giant IBM announced today www.ibm.com Big Blue today announced record profits for the quarter Joe’s computer hardware links Compaq HP IBM Anchor Text Indexing • Sometimes yields unexpected results: • Google ranked Microsoft #1 for “evil empire” Adapted from: Manning and Raghavan

  21. Collecting Click Streams • Browsing histories are easily captured • Make all links initially point to a central site • Encode the desired URL as a parameter • Build a time-annotated transition graph for each user • Cookies identify users (when they use the same machine) • Redirect the browser to the desired page • Reading time is correlated with interest • Can be used to build individual profiles • Used to target advertising by doubleclick.com

  22. 180 160 140 120 Reading Time (seconds) 100 80 60 58 50 43 40 32 20 0 No Interest Low Interest Moderate Interest High Interest Rating Full Text Articles (Telecommunications)

  23. More Complete Observations • User selects an article • Interpretation: Summary was interesting • User quickly prints the article • Interpretation: They want to read it • User selects a second article • Interpretation: another interesting summary • User scrolls around in the article • Interpretation: Parts with high dwell time and/or repeated revisits are interesting • User stops scrolling for an extended period • Interpretation: User was interrupted

  24. 55 42 51 52 No Interest No Interest Low Interest Moderate Interest High Interest Abstracts (Pharmaceuticals)

  25. Recommending w/Implicit Feedback Estimate Rating User Ratings User Model Predicted Ratings User Observations Community Ratings User Ratings Ratings Server Predicted Observations User Model Estimate Ratings Predicted Ratings User Observations Community Observations Observations Server

  26. Critical Issues • Protecting privacy • What absolute assurances can we provide? • How can we make remaining risks understood? • Scalable rating servers • Is a fully distributed architecture practical? • Non-cooperative users • How can the effect of spamming be limited?

  27. Gaining Access to Observations • Observe public behavior • Hypertext linking, publication, citing, … • Policy protection • EU: Privacy laws • US: Privacy policies + FTC enforcement • Statistical assurance of privacy • Distributed architecture • Model and mitigate privacy risks

  28. A: Southeast Asia (Dec 27, 2004)    B: Indonesia (Mar 29, 2005)    C; Pakistan (Oct 10, 2005)    D; Hawaii (Oct 16, 2006)    E: Indonesia (Aug 8, 2007) F: Peru (Aug 16, 2007) Search EngineQuery Logs

  29. Search Engine Query Logs http://hannu.biz/aolsearch/

  30. AOL User 4417749

More Related