480 likes | 653 Views
Information Search (Shneiderman and Plaisant, Ch. 13). from http://wps.aw.com/aw_shneider_dtui_13. Overview. Introduction “Information search should be a joyous experience” Searching in Textual Documents Multimedia Document Searches Advanced Filtering and Search Interfaces
E N D
Information Search(Shneiderman and Plaisant, Ch. 13) from http://wps.aw.com/aw_shneider_dtui_13
Overview • Introduction • “Information search should be a joyous experience” • Searching in Textual Documents • Multimedia Document Searches • Advanced Filtering and Search Interfaces • Information Foraging • Some trees …
Information Search • Critical need to access information, as part of any task • always has been, always will be (ahbawb) • Cultural change, if not evolution, due to amount of information accessible by individual • “Information overload” – ahbawb • What’s new is ubiquity due to massive e-access • Old school “information retrieval” and “end user searching” • Gurus and cost • Genuinely new … • Interest, due to market/user size • E.g., search engines can be profitable • tools, e.g., visualization, due to Moore’s law
Information Search - Words • Old school • Information retrieval, database management • Bibliographic document systems, structured relational db – attributes • New school • Information gathering, seeking, filtering, sensemaking, visual analytics • CS focus • Data mining, data warehouses, data marts • Toward future ends such as • Knowledge networks, semantic webs, … • Range of search elements increases • Cf. Hearst November, 2011 CACM paper, “collaborative search” (on web site)
Search Terminology • Shneiderman’s taxonomy • Task objects • E.g., movies for rent, are stored in structured relational databases, textual document libraries, or multimedia document libraries • Structured relational database • relations and a schema to describe the relations • Relations have items (usually called tuples or records), and each item has multiple attributes (often called fields), which each have attribute values • Textual document library • Set of collections • typically up to a few hundred collections per library) • descriptive attributes or metadata about the library • E.g., name, location, owner
Search Terminology, 2 • Task actions are decomposed into browsing or searching • Examples of task actions: • Specific fact finding (known-item search) • Find the e-mail address of the President of the United States • Extended fact finding • What other books are by the author of “Jurassic Park”? • Exploration of availability • Is there new work on voice recognition in the ACM digital library? • Open-ended browsing and problem analysis • Is there new research on fibromyalgia that might help my patient?
Search Terminology, 3 • Once users have clarified their information needs, the first step towards satisfying those needs is deciding where to search • Supplemental finding aids can help users to clarify and pursue their information needs, e.g. table of contents or indexes • Additional preview and overview surrogates for items and collections can be created to facilitate browsing
Searching Textual Documents • As noted, recent dramatic changes • Historically, Boolean clause search and SQL • Other methods include: • Natural language queries • Form fill-in • Query by example (QBE) • Evidence shows that users perform better and have higher satisfaction when they can view and control the search
Ex., Library of Congress • Aids to find bills, etc • “Multiple paths to information items” • (had a look, just for fun) • Not bad
Ex., Library of Congress Aids to find bills, etc
Ex., Library of Congress Aids to find bills, etc
Ex., Library of Congress Aids to find bills, etc
Searching in Textual Documentsand Database Querying, 2 A search for “user interface” powered by Endeca (http://www.lib.ncsu.edu) returns 144 results grouped into 10 pages. The menu at the upper right allows users to sort results by relevance or by date, while on the left a summary of the results organized by Subject, Genre, or Format provides an overview of the results and facilitates further refinement of the search.
Framework for Textual Search • Recall, task delineation for interface design • Shneiderman suggests stages to consider in textual search • Overview below, detail, next slide: • Formulation: expressing the search • Initiation of action: launching the search • Review of results: reading messages and outcomes • Refinement: formulating the next step • Use: compiling or disseminating insight
Multimedia Document Searches • “Multimedia” (non-textual) search is hard • Quickly evolving area • Interface issues essentially undefined • “Hum that tune”, “what did he/she/it look like” • Types: • Image search • Map search • Design or diagram search • Sound search • Video search • Animation search
Image Search • Finding photos with images such as the Statue of Liberty is a challenge • Query-by-Image-Content (QBIC) is difficult • Search by profile (shape of lady), distinctive features (torch), colors (green copper) • Simple drawing tools to build templates or profiles to search with • More success is attainable by searching restricted collections • Search a vase collection • Find a vase with a long neck by drawing a profile of it • Critical searches such as fingerprint matching requires a minimum of 20 distinct features • For small collections effective browsing and lightweight annotation are important
Map Search • On-line maps are plentiful • Search by latitude/longitude is the structured-database solution • Today's maps are allow utilizing structured aspects and multiple layers • City, state, and site searches • Flight information searches • Weather information searches • Mapquest, Google Maps, etc. • Mobile devices can allow “here” as a point of reference
Other Multimedia Searches • Design/Diagram Searches • Some computer-assisted design packages support search of designs • Allows searches of diagrams, blueprints, newspapers, etc., e.g. search for a red circle in a blue square or a piston in an engine • Document-structure recognition for searching newspapers • Sound Search • Video Search • Provide an overview • Segmentation into scenes and frames • Support multiple search methods • Animation Search • Possible to search for specific animations like a spinning globe • Search for moving text on a black background
Image Search Sketch or image to start
Advanced Filtering & Search Interfaces • Wide range of interface strategies and styles • Filtering with complex Boolean queries • Automatic filtering • Dynamic queries • Faceted metadata search • Query by example • Implicit search • Collaborative filtering • Multilingual searches • Visual field specification
Advanced Filtering and Search Interface Examples, 1 • Alternatives to form fill-in query interfaces: • Filtering with complex Boolean queries • Problem with informal English, e.g. use of ‘and’ and ‘or’ • Venn diagrams, decision tables, etc., not worked for complex queries • Dynamic Queries • “Direct manipulation” queries • Use sliders and other related controls to adjust the query • Get immediate (less than 100 msec) feedback with data • Dynamic HomeFinder and Blue Nile and (sort of) Realtor.com • Hard to update fast with large databases
Dynamic Queries • Diamond price, rating indicated using sliders, etc.
Advanced Filtering and Search Interface Examples, 2 • Alternatives to form fill-in query interfaces: • Filtering with complex Boolean queries • Problem with informal English, e.g. use of ‘and’ and ‘or’ • Venn diagrams, decision tables, etc., not worked for complex queries • Dynamic Queries • “Direct manipulation” queries • Use sliders and other related controls to adjust the query • Get immediate (less than 100 msec) feedback with data • Dynamic HomeFinder and Blue Nile and (sort of) Realtor.com • Hard to update fast with large databases • Query previews present an overview to give users information and distribution of data to eliminate undesired items • Faceted metadata search • Integrates category browsing with keyword searching • Flameco
Faceted Metadata • Facets include media, location, date, themes
Advanced Filtering and Search Interface Examples, 3 • Collaborative Filtering • Groups of users combine evaluations to help in finding items in a large database • User "votes" and info used for rating the item of interest, • e.g. Rating restaurants highly is given a list of restaurants also rated highly by those who agree the six are good • Multilingual searches • Current systems provide rudimentary translation searches • Prototypes of systems with specific dictionaries and more sophisticated translation • Visual searches • Specialized visual representations of possible values, e.g. dates on a calendar or seats on a plane • On a map the location may be more important than the name • Implicit initiation and immediate feedback
Tree Map of Products(Shneiderman) Using The Hive Group’s treemap (http://www.hivegroup.com/), users can review all waterproof binoculars in the catalog of Amazon.com products and browse the items in the list, grouped by manufacturer. Each box corresponds to a pair of binoculars, and the size of the box is proportional to its price. Green boxes are best-sellers. Users can filter the results using the dynamic query sliders on the right. Here all the binoculars with less than three user reviews have been filtered out, leaving only 61 binoculars to consider.
Cost of Knowledge, Search,Cognition, and Computers • Information systems (computers) and “cost” of acquiring knowledge • A first principle of information system design • “Cognitive information ergonomics” • Efficiency/productivity gain/usability/… • “Economics of cognition and the cognitive cost of knowledge” • There is (and has always been) a cost to acquire information / knowledge • cost = user/worker time +, e.g., machine cost, db access charge, book • Many studies fail to document increased profit directly from implementation of (single) information system • However, no doubt that worker productivity in late 20th century dramatically increased • Productivity greatly enhanced by pervasive use electronic information systems (computers)
Informavores and Information Foraging • That human quest for information is innate and adaptive is well known • Humans are informavores • George Miller, 1983, “… magic number 7 + 2” • Organisms that hunger for information about the world and themselves • “A wealth of information creates a poverty of attention and a need to allocate it efficiently” • Herb Simon, AI, Nobel prize, economics, cognition • Consider analogy of acquiring knowledge with animals seek food • Pirolli, P. and S. Card (1995). Information Foraging in Information Access Environments, in CHI '95, p. 518 • Pirolli, P. (2007) ….. Book …..
Information Foraging Theory (IFT) • Information Foraging Theory (IFT) • Pirolli and Card – Xerox PARC • “an approach to the analysis of human activities involving information access technologies” • Derives from optimal foraging theory in biology and anthropology • Analyzes adaptive value of food-foraging strategies • Analyzes trade-offs in value of information gained against the costs of performing activity in human-computer interaction tasks • And need models and analysis techniques to determine value added by information access, manipulation, and presentation techniques • Real information system design problem is not how to collect more information, but how to optimize user’s time • Increase relevant information gained per unit time expended • IFT provides a relatively “formal” (quantitative) account
IFT – Time Scales • Considers “adaptiveness of human-system designs in the context of the information ecologies in which tasks are performed” • Ecology, as system, here, information • Time scales of information seeking and sense making activities: • Cognitive band (~100 ms – 10 s) • Rational band (minutes to hours) • Social band (days to months) • Have seen much of cognitive, now others
Problem solving • Decision making 10-1000 • Visual search • Motor behavior 1-100 • Visual attention • Perceptual judgment Pete Pirolli's Home Page Peter Pirolli. ... Palo Alto, CA 94304 USA phone: +1-650-812-4483 fax: +1-650-812-4241 email: pirolli@parc.xerox.com This page updated December 18, 2000. www.parc.xerox.com/istl/members/pirolli/pirolli.html - 9k - Cached - Similar pages .100-1 Time scales of analysis Psychological domain User Interface Domain Time scale (s)
IFT – An Ecological Perspective • Time scales of information seeking and sense making activities • Cognitive band (~100 ms – 10 s) • Rational band (minutes to hours) • Social band (days to months) • As time scale increases, less regard for how internal processing accomplishes linking of actions to goals • Assumes behavior governed by “rational principles and shaped by constraints and affordances of the task environment” • An ecological perspective, i.e., that behavior is “adaptive” in that it accomplishes some goal
IFT – Metaphor and Quantitative • Information Foraging Theory • name both a metaphor and straightforward use of biological “optimal foraging theory” • Metaphor: • Animals adapt behavior and structure through evolution • (humans don’t have to wait that long!) • Animals adapt to increase their rate of energy intake, etc. • To do this they evolve different methods • E.g., wolf hunts prey, spiders build webs and wait • And there are analogies to this • E.g., hunting = active information seeking, waiting = information filtering • Humans (and others) hunt in groups - when variance of food is high • Accept lower expected mean to minimize probability of days without food • Also, on social time scale, sharing of information
Optimal Foraging Theory - Biology • Developed in biology for understanding opportunities and forces of adaptation • P&C use elements of the theory to help in understanding existing human adaptations for gaining and making sense of information • Also, aid in task analysis for creating new interactive information system designs • Optimality models include: • Decision assumptions • Which of the problems faced by an agent are to be analyzed • E.g., whether to pursue a particular type of information (or prey) when encountered, how long to spend • Currency assumptions • How choices are to be evaluated, e.g., information value (food value) • Constraint assumptions • Limit and define relationships among decision and currency variables • E.g., from task structure, interface technology, user knowledge
Information Foraging Theory • Information foraging usually a task embedded in context of some other task • Value and cost structure defined in relation to the embedding task • Value of external information may be in improvements to outcomes of embedding task • Usually, embedding task is some ill-structured problem • Additional knowledge is needed to better define goals, available actions, heuristics, etc. • E.g., choosing a graduate school, developing business strategy • Though use optimality model, not imply human behavior is classically rational • I.e., have perfect information and infinite computational resources • Rather, humans exhibit bounded rationality, or make choices based on satisficing
IFT – Information Patch Model • Information patch model – from optimal foraging theory • Rate of currency intake, R = U / (Ts + Th) • U = net amount of currency gained • Ts = time spent searching • Th = time spent exploiting • Net currency gain, U = Uf - Cf • Uf = overall currency intake (gross amount foraged) • Cf = currency expended in foraging • Average rate of currency intake u = Uf / lTs • If assume information workers/foragers/consumers encounter information as linear function of time • Total n items encountered = lTs, where l is rate of encounter with items
IFT – Information Patch Model quickly … • Average cost of handling items: • Let s = search cost per unit time, then total cost of search = sTs • Then, substituting in equation for R, rate of currency intake: • So, can express R in terms of • Average rate of currency intake, u • Search cost per unit time, s • Cost of handling items, h
IFT – Information Patch Model And so forth …
An Example: Scatter Gather Hierarchical clustering of document Users see “overview” of document clusters Allows user to navigate through clusters and overviews
Scatter/Gather Task Display Titles Window Scatter/Gather Window Law Nat. Lang. World News Robots AI Expert Sys CS Planning Medicine Bayes. Nets
Optimal Foraging Time in a Patch Information gained time • gi(t), cumulative gain function • Amt of information gained in time t • gA(t) = random order of encounter • Increase in information equal for all elements • Hence, constant slope • gB(t) and gc(t) = ordered by relevancy • “Relevant” items, those with higher information content, encountered earlier • Hence, highest rate of information increase earlier, and rate decreases • lp, rate of encounter with relevant items • x-axis, travel time between patches • RBand RC = rate of return • tcandtboptimal foraging time • Foraging longer in the “patch” not optimal
IFT - Cost of Knowledge • Foraging Efficiency • Animals minimize energy expenditure to get required gain in sustenance • Humans minimize effort to get necessary gain in information • Again, foraging for food has much in common with seeking information • Like edible plants in wild, useful information items often grouped together, but separated by long distances in an “information wasteland” • Also, information “scent” • Like scent of food, information in current environment that will assist in finding more information clusters • Activities analyzed according to value gained and the cost incurred • Resource costs • Expenditures of time and cognitive effort incurred • Opportunity costs • Benefits that could be gained in engaging in other activities • “Cost of lost opportunity” • E.g., if not gaining information about algorithms (or messing with registration system), could be gaining information about software design
IFT • Information processing systems evolve so as to maximize the gain of valuable information per unit cost • Sensory systems (vision, hearing) • Information access (card catalogs, offices) ) ( information value cost of interaction maximize
End .