1 / 21

WIRED Week 3

WIRED Week 3. Key Concepts in IR Mozilla & Firefox Projects & Papers. Key Concepts in IR. Understanding the System Can’t read users’ minds Can’t know “about” documents Evaluation is key Information Needs “More like this” Starting points, guides Topics, Subjects Documents Images

hinto
Download Presentation

WIRED Week 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WIRED Week 3 • Key Concepts in IR • Mozilla & Firefox • Projects & Papers

  2. Key Concepts in IR • Understanding the System • Can’t read users’ minds • Can’t know “about” documents • Evaluation is key • Information Needs • “More like this” • Starting points, guides • Topics, Subjects • Documents • Images • Text, Natural Languages • A query as a text • Not just (simple) question answering

  3. Aboutness & Subject Indexing • What is “aboutness”? • Meaning of a document • Abstract or Topic(s) of a document • How you (or someone else) uses the document • Kinds of questions the document can answer • How can we uncover aboutness? • Author(s), Time, Date, Location, Format • Relationships, Sturctures, Markup, Metadata • Use, Recall, Popularity

  4. Subtleties of Aboutness • Can we characterize a whole document • Parts of a document • Each part, different descriptions (& uses) • Do you need the document if you’ve got a good summary? • Not just text summaries • Use & origination data • How do you extract key information? • Understand the context • Frequency & Rarity • NLP, Genres, Keyword indicators • Sentence diagrams to the extreme? • Novelty of informaiton, expectations for education • Politics of description

  5. Aboutness & the Web • Rapid & broad analysis • Let users define aboutness • Different users = more descriptions • Lots of users, lots to select from • A system to average & rank aboutness descriptions? • World Wide - means different cultures • More older documents with many more very new documents • Differences and “it’s like that one” • Internal consistency vs. flexiblity & context

  6. Testing Index Language Devices • What are the different ways to represent documents? • Systems are faster, but designs differ • Can you represent them in more than one way? • At once? • By audience? • Not just terms, but relationships between terms • What language do you use to represent docs? • Structure & Flexible • Consistent & Understandable (human & computer) • Dewey, LoC, Dublin Core • Data structures, XML, Situational-Temporal • What if you indexed documents by terms & queries? • Can you get too complex? • Good for the user vs. good for the system

  7. Indexers & Issues • Staff for evaluation • How is the system used? • Card catalogs • Search engine results pages • Natural language queries & NL answers • Vocabulary of document, index or user impacts? • Syntactic indexing • “use of headings which display the relationship between the various elements, as distinct from those which merely show existence of several attributes relevant to the subject indexed.” p 98

  8. Preparation of an Index • Assess document subject • Related to users • Concepts & keywords • Translate assessment into index language • Add to index • Make concept analysis for answering questions • Will users understand & find document • How helpful (ranking) • Match concepts to index (to document) • Rebuild & enable updated index

  9. Index Language parts • Controlled vocabulary (p 99) • Specific terms for relevance (p100) • Measuring for performance • Precision • Recall • (Relevance) • With the Web, we don’t know how many total documents for a subject or how many are correct • With the Web, we don’t know how documents are described or indexed • Metatags • Keywords • Indexing databases • Crawling & updating

  10. Thesaurus • “Theory of Clumps” • Treasury of words • How deep are the relationships? • Can relationships & relevance be measured? • How specific can one be? • Not just alphabetical, topical • Purposes of a Thesaurus (p 112) • Which are most important? • What’s missing?

  11. Variety of Thesaurus formats • Roget’s • Alphabetical with cross indexing • Subject categories (as numbers) • Ordering • Sub-ordering • Relationships • Language issues, syntax & completeness (phrases) • Shifted, inverted & rotated • “complications -- IR systems”

  12. Terms • Number of terms • Singular, plural • Phrases, quotes, cliches • Desciptive, contextual • Symbols • Homographs & Thesaurofacets • Just a few ways to impose formats & structure • What are some other methods?

  13. Layouts & Display of Thesauri • Most dynamic area • Making it easier to build thesauri • Get whole or specific picture • Expose structure to users • For understanding • For approval • Graphical displays • Browsing • Trees, Flowcharts, Maps • Colors, shapes, sizes

  14. Revising, Adding & Relations • Most issues in reading minor in systems now • New problems in issues of scale • Generate new vs. add to existing? • Where do the experts fit in? • Building a set of rules • Beyond formats • Testing for internal consistency • How do you link or merge two thesauri? • Little merges into larger? • More detailed encompasses less? • Can you ever get agreement?

  15. Problem Structures & HCI • A call to make IR systems more usable • Let users search systems themselves • Make systems work more like users think they should (for what year?) • Is a search like a dialogue? • Person to person • Person to machine • Multiple questions & answers to get to the point • Understanding language & behavior • “Do what I mean, not what I say” • Indentifying the problem • Focusing the question (related to the available documents) • User familiarity with system

  16. Interaction, step 1 for Evaluation • Benchmarks for evaluation • How would a person ask this question? • What kind of answers are received? • How are subtle expectations met? • How long or comprehensive is the question or the answer? • How is this different for Web IR? • What advantages do both physical & virtual search systems have?

  17. Relevance: Review & Framework • Finding the needle in a haystack • A few documents in a collection • Possible that no documents are perfectly relevant • Not just a content match • Dependent on the user & situation

  18. Relevance & the system • Relevance as a point of measurement • Different fields gague relevance differently • Scientific communication • Communication (Theory) • Psychology • Information Systems • False Drops vs. Completeness • Rarity & value of information • Precision & Recall probabilities of finding relevance • Tests were numerical, binary & structured

  19. Relevance is “no good”? • Very hard to define, should be ignored? • Too human centered • A gradual process moving towards the correct information • Cooper & Utility • Quality, novelty, importance, credibility • Wilson’s Situational Relevance • Psychological & Logical relevance • Matching vs. Satisfying • Situational • “Relevance numbers”

  20. Relevance Future Work • Knowledge and (the) knower • Selection • Inference • Mapping • Dynamics • Association • Redundancy • p161

  21. How can (Web) IR be better? Better IR models Better User Interfaces More to find vs. easier to find Scriptable applications New interfaces for applications New datasets for applications Projects and/or Papers Overview

More Related