790 likes | 1.07k Views
INFO624 -- Week 9 Effective Information Retrieval. Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University. Effective Information Retrieval. System’s perspectives Fast indexing and retrieval algorithms Inverted indexing. Tree structures, Hash tables
E N D
INFO624 -- Week 9Effective Information Retrieval Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University
Effective Information Retrieval • System’s perspectives • Fast indexing and retrieval algorithms • Inverted indexing. Tree structures, Hash tables • Semantic indexing and mapping • Subject indexing • Latent semantic indexing • Intelligent information retrieval • Knowledge representation • Logical inferences
Effective Information Retrieval • User’s perspectives • Iteration • Relevance Feedback • Use User's Profiles • Graphical Display of Search Results • Browsing/Interactive Searching • We can’t change the user. We should make the system to adapt to the user’s needs
Iteration • Most search needs to be done iteratively • From the user’s point of view • The first query often does not retrieve what the user wants • The user needs to see the output of previous queries to construct the next query • The user often needs to reconstruct his/her information needs after they read/browse search results.
Iteration – User’s strategies • Modify queries repeatedly based on some goals • Starting with high precision • Use a specific query first • Broaden queries to include more relevant documents • "pearl growing" • Starting with high recall • Use a very broad query • Improve precision gradually • "onion peeling" • Starting with known items • Find documents similar to the known items • Browsing/interactive searching
Iteration – System’s strategies • If the system can “learn” from the user’s activities, the system likely can retrieve better results to meet user’s needs. • Relevance feedback • User’s profiles • The system should provide better output representations to help the user • Browse • Conduct interactive searches.
Relevance Feedback • Feedback: The user provides information that the system can use to modify its next search or next display • Relevant Feedback: • Users let the system know • what documents are relevant to their information needs • What concepts or terms are related to their information needs • What weights they would like the system to put on each relevant documents/terms
Relevant Feedback – System’s Strategy • The system should invite the user to select relevant documents/terms from the retrieved results before the second retrieval is conducted • The system should use information from user's feedback to conduct next search.
Design IR Systems with relevance feedback • Collect relevance feedback through • Binary vs. scales • Positive and negative feedback • Apply relevance feedback to • Query • Profile • Document • Retrieval algorithm
User Profiles • User profiles • information about the user’s information needs that IR system can use to modify its search process. • Simple user profiles • A list of terms that the user selects to represent his/her information needs • A list of terms with weights
Extended user profiles • More complex term structures • Information use patterns • levels of interests • User’s background information • User’s browsing behaviors • What pages the user has visited last week, last month, … • From which page to which page …
Use of user Profiles • Selective Dissemination of Information (SDI) • The system regularly runs the search to get any new information that matches user’s profiles. • The user can set up several profiles • Once they are set up, the queries are always the same. • The user can set the frequency of the update searches.
SDI • Advantages of SDI • Automatic retrieval of new information for the user • Set up a profile once, use the profile for retrieval many times. • The user can change the profiles or the search frequency as needed. • Disadvantages of SDI • The query based on the profile is static • Timing problems • Information in need is information indeed. • Something I am very interest, but it did not come at the time I want to read it.
Use profiles during the search • Modify the query • When the user sends a query, the system automatically adds some terms to the query from the user’s profiles. • When the user sends a query, the system checks if the query terms is in user’s profile. If it is, increase the weight for the terms. • Organize the search results • When the user sends a query, the system uses the profiles information to organize the search results (such as clustering, ranking, )
Browsing • Browsing is an act of human information seeking • a mental process of identifying and choosing information • a dynamic process that varies in time and depends on intermediate results. • a part of process of decision making, problem solving, etc.
Browsing for Information Retrieval • A kind of searching process in which the initial search criteria or goals are only partly defined • general-purpose web browsing • An art of not knowing what one wants until one finds it • visual recognition • content recognition
Browsing for Information Retrieval • A learning activity that emphasizes structures and interactive process • exploratory • movements based on feedback • A process of finding and navigating in a unknown or unfamiliar information space • becoming aware of new contents • finding unexpected results
Search or Browse? • Would you like to search using a search engine or would you like to browse from pages to pages (or through a hierarchy)? • Depend on what?
Factors of browsing • Purposes • Fact retrieval • Concept formation or interpretation • Current awareness • Tasks • Well-defined tasks • Ill-defined tasks • number of items to browse
Factors of browsing • Individual characteristics • Motivation • Experience and knowledge • Cognitive styles • Context • Subject disciplines • Organizational schemes • Nature of text/information • Medium • Does the system support browsing?
IR Systems that support browsing • Good navigation tools • Easy to move from one item to another • Links • good structures • fast access • Easy to back track • Correct any errors • make new selections
IR Systems that support browsing • Good displays • easy to read • meaningful orders of retrieval results • graphical presentation • Meaningful content organization • contextual hierarchical structures • Grouping of related items • Contextual landmarks
“why just browse when you can fly?” • HotSauce is an innovative 3D fly-through interface for navigating information spaces. It was developed, largely as a one-man effort, by Ramanathan V. Guha while at Apple Research in the mid-1990s. HotSauce was a specific 3D spatialization of the Meta Content Framework (MCF) also developed by Guha.
Why Surf alone? • What if you had an assistant always looking ahead for you [when browsing the web]…. • The assistant could warn you if the page was irrelevant, could alert you if that link or some other link merited your attention. • The assistant could save you time and frustration. CACM,44(8), p.71, 2001
Information Agents • a software that applies user profiles, dynamically and intelligently, to search tasks • Search distributed, possibly heterogeneous information resources on the user’s behalf. • Gather and integrate search results by some Artificial Intelligence techniques • Accept user’s feedback and use the feedback to modify the user profiles and search strategies
Architecting Browsable Websites • Design site structures • Metaphor Exploration • Organizational metaphors • Functional metaphors • Visual metaphors • Define Navigation • Global navigation • Local navigation • Design Document
Interactive Systems • “When an interactive system is well-designed, the interface almost disappears, enabling users to concentrate on their work, exploration, or pleasure.” • Ben Shneiderman
Design Principles • Offer informative feedbacks • Relationships between query and documents retrieved • Relationships among retrieved documents • Relationships between metadata and documents • Reducing working memory load • Keep tracks of choices made during the search process • Allow user to return temporarily abandoned strategies or jump from one strategy to another • Retain information and context across search session.
Provide alternative interfaces for novice and expert users. • Simplicity vs. power
Output Presentation for Search engines • Two major issues • What information to present? • How to organize the output items? • Information in the output display • Traditional databases • Document reference numbers (unique number) • Citations (author, title, source) • Document surrogate (citation plus abstract and/or indexing terms) • fulltext
On the web • title, url • First few sentences/related sentences/summaries • Dates / page sizes • Degree of relevance • special links • “find similar one” • Types of links • Related categories
What other information you may wish to have in the retrieval output? • Citations (or links from this document)? • Critique or evaluation? • Access information (how many times it was accessed in last 6 months)? • Links to this document • Author contact information ? • Why documents were retrieved?
Output organization • Linear • a list of documents • listed by • best match • alphabetical orders • dates • order of selected fields (authors, titles, web sites)
Linear display • Practical and most popular • easy to generate • users know how to use it • Did not shown relationships among documents! • Document relationships are more complex than a linear one
Hierarchical display • Separate data into different levels or branches • Branches can be expanded/collapsed. • Show more data in less space • Show the organization of the data
Graphical displays • Show more complex relationships • Use location, colors, dimensions, etc to represent documents, terms or concepts. • Provide more interactive functions
What is IV? System-centered View • The use of computer-supported, interactive, visual representations of abstract data • to assist navigation in large information spaces • to reveal complex information structures • to amplify cognition User-centered
IV and IR • Both need to process a large amount of information • Both are tools to assist the cognitive process of finding, learning, and understanding information. • Both face the challenge of “uncertainty” • Not an “Exact science” • Both subject to human’s interpretation.
VIRI -- Visual Information Retrieval Interfaces • 2-dimensional graphical display • use graphical objects (icons, dots etc.) to represent documents • Use geographical relationships to indicate document relationships • use colors to group/differentiate documents • use animation to assist interaction
Concept Visualization • AltaVista LiveTopic • HiBrowse Interface • SemioMap • Hyperbolic Trees • Visual Thesaurus • Visual Concept Explorer
Topic Maps • Highwire: http://www.highwire.org