1 / 33

Information Access I Interactive Information Search

Information Access I Interactive Information Search. GSLT, Göteborg, October 2003. Barbara Gawronska, Högskolan i Skövde. 2nd intensive week: Interactivity (Th 8-12 BG, 13-15 MM) Multilingual systems and resources (Fr 8-10 MM, 10-12 BG) Evaluation (Fr 13-15 BG).

orly
Download Presentation

Information Access I Interactive Information Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Access IInteractive Information Search GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde

  2. 2nd intensive week: • Interactivity (Th 8-12 BG, 13-15 MM) • Multilingual systems and resources (Fr 8-10 MM, 10-12 BG) • Evaluation (Fr 13-15 BG)

  3. Some repetition...: Data Retrieval vs. IR (2)(the German IR Research Group) IR systems have to handle ”uncertain knowledge” (”unsicheres Wissen”): • Vague queries; reformulation frequently required • The problem of the user’s own understanding of his/hers information need • Limitations of knowledge representations This implies interaction need.

  4. A General Model of an IR system(Fuhr 1995:11)

  5. A Basic Model of a Document Retrieval System(Fuhr 1995:11)

  6. A document from different perspectives(Meghini et al. 91, modified)

  7. Different aspects of a search

  8. But where and when the interactivity is needed?

  9. How to diagnose the need of interaction refinement? • User studies (still to sparse): • User in contact with existing systems: • Free task choice • Predefined tasks • Wizard-of-Oz experiments • Relevance feedback (”real” och ”pseudo”)

  10. Wizard-of-Oz experiments(Dahlbäck, Jönson...) • Users tend to spontaneously produce a kind of ”controlled” language: • written language syntax (complete sentences, elipsis avoided) • ”reparations” not frequent • pronominal anaphora less frequent than in human-human communication

  11. Wizard-of-Oz experiments (3) • ”Controlled” language in users (3) • A psycholinguistic reflection: it is not unlike”baby-talk” (i.e. the way of talking to young children or unskilled/unidiomatic speaker of a language) • This can make human-computer NLP-dialogue a less complicated task than e.g. translating human-human dialogue • Theree seem to be age related differences in the way of inteeracting with computer systems

  12. But: • If the system makes an impression of being too smart, the user normally becomes more natural in his/her linguistic behaviour, which causes problem to the system... Should the systems responses remain a little ”stupid”???

  13. Now, back from wizards to existing systems. Let’s think about IR-models again.

  14. But where and when the interactivity is needed?

  15. Information request level: Common Problems: • Spelling errors (recall Hercules´ lecture) • Connector interpretation: Natural Language conjunctions vs. logical connectors; conjuction symbols in IR systems may be ambiguous: ”Food for cats and dogs”

  16. Information request level (2) • Negation (examples inspired by Fuhr 1995): ”Drugs and sedatives without relation to aging” ”Drugs and sedatives, not related to aging” ”Drugs and sedatives, no aging” ”Drugs and sedatives, not age”

  17. Information request (3) • What kind of feedback would be useful on this level? (Feedback, definition (Meadow et al. 2000: 246, Mc GrawHill 1971): Feedback = information derived from the output of a process and used to control the process in the future

  18. Possible feedback format on the infromation request level (?) • Predicate logic? • For(food,cat) & for(food,dog) Or For (food, cat) or for(food,dog) Or For(food,cat) & dog Generate NLP questions? Leave everything to the user? Or? How to present the feedback? Menu choice?

  19. Between information request level and formal query level Meadow et at 2000: 179ff: examples from Dialog: • SSELECT CAT interpreted as: SS (=SELECT SETS) CAT • SELECTiON (wrongly used instead of the standard command SELECT) interpreted as: S(=SELECT) ION What kind of feedback would be useful on this level?

  20. Between the information request/formal query level and database objects If the request/query is ambiguous: • Give some feedback and try to resolve the ambiguity before searching the database, or after the search, before presenting the documents (”Delayed disambiguation”) ? • What search stage is most suitable for feedback/dialog? What factors should be taken into account?

  21. Search stages, or ”states” in searchers(Penniman & Dominick 1980, Chapman 1981) • Database selection • Exploration of individual terms (looking up terms in a thesaurus or an inverted file in order to decide which terms are to be used in the query) • Record search by term combinations • Record browsing and display • Record evaluation ( for possible iteration)

  22. Levels of search activities(Bates 1990, Fuhr 1995) • Strategy (= a plan for an entire information search, e.g. Find relevant literature for a course in IA) • Strategem: e.g. journal run, citation search... • Tactic: one or several moves made to further the search • Move: a single action

  23. Levels of system involvement(Bates 1990) • No system inolvement: All search activities human generated and executed • Displays possible activities: system lists search activities when asked. Some of the activities may be executable by system, some may not. • Monitors search and recommendssearch activities: • Only when searcher asks for suggestions • Always when it indentifies a need • Executes desired actions automatically

  24. Relevance feedback and query reformulation

  25. Query modification by relevance feedback (picture from M.A. Hearst, http://www.sims.berkeley.edu/courses/is202/f98/Lecture25/sld005.htm)

  26. How to utilize terms extracted from relevant documents? • The extracted terms may be added to the query • They may be presented for the user, who makes the decision about modification • They can be used for re-weighting the terms in the query

  27. A standard method for re-weighting: Rocchio’s Algorithm(Rocchio 1971) • Goal: to achieve an optimal query An optimal query maximizes the difference between average relevant vector and average nonrelevant vector

  28. A standard method for re-weighting: Rocchio’s Algorithm(Rocchio 1971; many modifications, e.g. Salton & McGill 1983; Picture from Srinivasan 2003, http://mingo.info-science.uiowa.edu:16080/courses/230/Lectures/Vector.html#1c) Qnew = a Q old + b Average Relevant Vector - c Average Nonrelevant Vector

  29. Rocchio’s Algorithm (2)(Rocchio 1971; many modifications, e.g. Salton & McGill 1983;a more formal way of expressing the same thing – Meadow et al. 2000:258) QW: the initial query vector QW’: the vector of the modified query R= the number of the relevant retrieved documents N= the number of the not relevant retrieved documents DW = the document vector ,  = coefficients that must be determined experimentally ( often about 0.75,  about 0.25)

  30. Future? According to several studies, Machine Learning methods perform better than different variants of Rocchio’s algorithm. Your experience?

  31. Future? Future users – a preliminary case study (age:12-13) First observations: • most frequent search goals: to DO things, not to read documents. ”Download movies”, ” Prenumerate X”, ”Translate X” etc.

  32. Future? (young users) • Queries in English dominate (specific for Swedish kids, or? What does it mean for multilinguality?) • Narrow terms dominate, specific terms more frequent than general • Quite aware of the danger of information overload • Short queries, 2-3 words per query • ”No idea to search for subcategories” (!)

  33. Future? (young users) Consequences for system design and feedback planning?

More Related