1 / 28

Georg Buscher , Andreas Dengel, Ludger van Elst German Research Center for AI (DFKI)

Query Expansion Using Gaze-Based Feedback on the Subdocument Level. Georg Buscher , Andreas Dengel, Ludger van Elst German Research Center for AI (DFKI) Knowledge Management Department Kaiserslautern, Germany. SIGIR 08. Outline. Motivation

Download Presentation

Georg Buscher , Andreas Dengel, Ludger van Elst German Research Center for AI (DFKI)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Expansion UsingGaze-Based Feedback on the Subdocument Level Georg Buscher, Andreas Dengel, Ludger van Elst German Research Center for AI (DFKI) Knowledge Management Department Kaiserslautern, Germany SIGIR 08

  2. Outline • Motivation • Reading detection and document annotation technique • Implicit feedback methods • Study design • Results /

  3. Outline • Motivation • Reading detection and document annotation technique • Implicit feedback methods • Study design • Results /

  4. Background and Motivation • Relevance feedback à la Rocchio is well understood • Feedback is mostly applied for entire documents • Precision presumably gets better when acquiring feedback on the subdocument level • Drawbacks of such fine-grained feedback: • Too much cognitive load for explicit feedback • Too little implicit feedback data through explicit interactions (e.g. highlighting) document / Relevance feedbackon the document level Relevance feedbackon the subdocument level •  Use eye gaze as source for implicit feedback on the subdocument level

  5. Outline • Motivation • Reading detection and document annotation technique • Implicit feedback methods • Study design • Results

  6. Eye Tracking • Unobtrusive • Relatively precise(accuracy: 1° of visual angle) • Expensive • Mostly used as „passive“ tool for behavior analysis, e.g. visualized by heatmaps: • We use eye tracking for immediate implicit feedback taking into account temporal fixation patterns

  7. Reading Detection • Starting point: Noisy gaze data from the eye tracker. • Fixation detection and saccade classification • Reading (red) and skimming (yellow) detection line by line See G. Buscher, A. Dengel, L. van Elst: “Eye Movements as Implicit Relevance Feedback”, in CHI '08

  8. Gaze-Based Document Meta Data • Line-matching by applying optical character recognition • Store reading information as document annotations in a semantic Wiki See G. Buscher, A. Dengel, L. van Elst, F. Mittag: “Generating and Using Gaze-Based Document Annotations”, in CHI '08

  9. Outline • Motivation • Reading detection and document annotation technique • Implicit feedback methods • Study design • Results

  10. Implicit Relevance Feedback for Query Expansion • Input: viewed documents having one specific task in mind • Find termsthatbestdescribetheuser‘scurrentinterest. • Usethesetermsforqueryexpansion terms describing theuser‘s current interest /context task / information needcontext

  11. Three Implicit Feedback Methods to Evaluate Input:vieweddocuments Gaze-Filter TF x IDF based on read or skimmed passages Gaze-Length-Filter • Interest(t) x TF x IDF based on length of coherently read text

  12. Gaze-Length-Filter • Long passages are passages containing at least 230 characters (i.e. more than the following two lines). • The heuristic assumes that shorter text parts only rarely convey sophisticated concepts to the reader. • It further assumes that readers are generally not very interested in the contents of short read or skimmed text parts. Therefore all terms contained in short read or skimmed text parts get a lower interest value. • # long read or skimmed passages containing t • Interest(t) = • # all read or skimmed passages containing t

  13. Three Implicit Feedback Methods to Evaluate Input:vieweddocuments Gaze-Filter TF x IDF based on read or skimmed passages Gaze-Length-Filter • Interest(t) x TF x IDF based on length of coherently read text Reading Speed ReadingScore(t) xTF x IDF based on read vs. skimmed passages containing term t

  14. Reading Speed • P are all read or skimmed passages containing term t. • The heuristic assumes that more thoroughly read text parts (and therefore their terms) are more likely to be of interest to the user than cursorily viewed parts. • 1 • Σ • ReadingScore(t) = • r(p) • |P | • p є P • t • t • t

  15. Three Implicit Feedback Methods to Evaluate Input:vieweddocuments Gaze-Filter TF x IDF based on read or skimmed passages Gaze-Length-Filter • Interest(t) x TF x IDF based on length of coherently read text Reading Speed ReadingScore(t) xTF x IDF based on read vs. skimmed passages containing term t Baseline TF x IDF based on opened entire documents

  16. Outline • Motivation • Reading detection and document annotation technique • Implicit feedback methods • Study design • Results

  17. Study Design • Informational task given • 2 different tasks • Task description in simulated email • Participants had to imagine being journalists • Read pre-selected documents • Email attachments • Document structure carefully chosen • Search for more information on Wikipedia • 3 different queries:main topic, sub-topic, related topic • Give relevance feedback for the first20 result entries per query 2x Read about topic in email Look through 4 emailattachments to getstarted with the topic 3x Find more informationby querying search engine Give explicit relevancefeedback

  18. Task Example • Topic: perceptual organs of animals • Pre-selected documents: 4 Wikipedia articles about cats, sharks, dogs, bats • The articles described all facets of the species. • Each article contained several paragraphs dealing with perception-related issues. • 3 different queries • Main topic query: more material about perception • Sub-topic query: more material about visual perception • Related-topic query: perceptual organs for the earth‘s magnetic field

  19. Result List Generation User • Create basic result list • Create expanded queries(+ top 50 terms) • Re-rank that list for every query expansion variant • Merge the re-ranked result lists in a balanced, ordered way • Present merged list to the participant User query Result list Viewed documents Variation: Baseline Expanded query 1 Re-ranked list 1 Variation: Gaze-Filter Expanded query 2 Re-ranked list 2 Variation: Gaze-Length-Filter Expanded query 3 Re-ranked list 3 Variation: Reading-Speed Expanded query 4 Re-ranked list 4 Merged result list

  20. Outline • Motivation • Reading detection and document annotation technique • Implicit feedback methods • Study design • Results

  21. Overview • 21 participants • 60-80 minutes per participant • 111 issued user queries • 2220 explicit relevance ratings • Distribution of the relevance ratings

  22. Precision and Discounted Cumulative Gain (DCG)

  23. Mean Average Precision • Powerful improvement of all gaze-based variants over the baseline • Reading-Speed variant is less effective than GF and GLF • GLF might be a bit better than GF? ** : p < 0.01 * : p < 0.05 (*): p < 0.1 (two-tailed paired t-test)

  24. Query Type Differentiation B: BaselineGF: Gaze-FilterGLF: Gaze-Length-F.RS: Reading-Speed • Generally similar trend within each query type • MAP consistently decreases from main topic to sub topic to related topic queries • Narrow information needs especially for related topic queries • Wikipedia did not contain too many relevant pages • MAP of the Baseline decreases much more (-0.25)compared to GF (-0.17), GLF (-0.18) Asterisks mark significance of improvement overthe baseline

  25. Inappropriate Context Pages about animal species • The baseline method extracts terms that might be far away from the user‘s current topic of interest. • Expanding the query with these terms can lead in a wrong and for the user unpredictable direction. • The more distant the topic of the user’s next query is (i.e. related topic query), the more negative is the effect of unsuitable terms for expanding the query. Gaze-based methods Parts of animal perception(e.g. only visual and auditory perception) Animal perception Baseline method Animal species

  26. Conclusion • Gaze data can effectively be analyzed and used as a source for implicit feedback • Reading behavior detection on its own provides useful information for query expansion and re-ranking • Precision can be improved just by adding those terms to a query that have been read before Future Work • More realistic web search scenarios (e.g. not only on Wikipedia) • More sophisticated heuristics for interpreting gaze-based feedback • Gaze also for long-term implicit feedback (e.g. desktop search)

  27. Interested? • Interested in implicit feedback for personalization? • E.g. scrolling behavior, click-through, mouse movements, eye tracking, EEG, bio sensors, emotions, magic, … • Please let me know! • georg.buscher@dfki.de •  Workshop?

  28. Thank you for your attention! Special thanks for the travel grant by - ACM SIGIR • - AmitSinghal made in honor of Donald B. Crouch • - Microsoft Research made in honor of Karen Sparck Jones

More Related