480 likes | 619 Views
WIRED Week 7. Quick review of Information Seeking Readings Review Questions & Comment How does this affect IR system use? How would this change evaluating IR systems? Topic Discussions Web search lab game!. What Is Information Seeking?.
E N D
WIRED Week 7 • Quick review of Information Seeking • Readings Review • Questions & Comment • How does this affect IR system use? • How would this change evaluating IR systems? • Topic Discussions • Web search lab game!
What Is Information Seeking? • “a process in which humans purposefully engage in order to change their state of knowledge.” p. 5 • “a process driven by human’s need for information so that they can interact with the environment.” p. 28 • “begins with recognition and acceptance of the problem and continues until the problem is resolved or abandoned” p. 49Marchionini • more than just representation, storage and systematic retrieval
Information Seeking in Context Learning Information Seeking Information Retrieval Browsing Strategy Analytical Strategy
How do we search? • Analytical • careful planning • recall of query terms • iterative query reformulations • examination of results • batched • Browsing • heuristic • opportunistic • recognizing relevant information • interactive (as can be)
Iseek - WebTracker study • Corporate IT and knowledge workers • In work environment • Own browser and network connection • Long-term study (weeks) • Overall Web use analyzed • Bookmarks, printed pages • How sites/pages found • Frequency of page visits
Web Study Methodology • Surveys • Interviews • Web Use Data* • History Files • WebTracker • Server Logs • Bookmarks* • Printouts
Study Elements • Research Design • Field Work • Field Workers • Data Collection 1. Questionnaire survey 2. WebTracker application (and Proxy Server) 3. Personal interviews
Collecting Web Client Data • Modified client • Pitkow and Catledge 1995 • Bookmarks • Chosen Web sites are personal information space • Most valuable data file on user’s system • Automatically organizing bookmarks • History logs • The history mechanism • Most promising source for usage data
Data Analysis • Log files tabulated into spreadsheets • Examined for clusters or patterns of behavior • Selection of episodes of Information Seeking behavior • a highlighting of the episode by the participant during the personal interview; • evidence of the episode having consumed a relatively substantial amount of time and effort; • evidence that the episode was a recurrent activity. • Determined the modes of scanning & moves exercised by the participants
Behavioral Model • Recurring Web behavioral patterns that relate people’s browser actions (Web moves) to their browsing/searching context (Web modes) • Modes of scanning: Aguilar (1967) & Weick & Daft (1983, 1984) • Moves in information seeking behavior: Ellis (1989) & Ellis et. al. (1993, 1997)
Behavioral Model Verification • 61 identifiable episodes
Behavioral Model Results • People who use the Web engage in 4 complementary modes of information seeking • Certain browser based actions & events indicate a particular mode of information seeking • Surprises • No Explicit Instances of Monitoring to Support Formal Searching • Very Few Instances of “Push” Monitoring • Extracting Involved Basic Search Strategies Only
Interview Highlights • Most useful work-related sites: • Resource sites by associations & user groups • News sites • Company sites • Search engines • Most people do not avidly search for new Web sites • Criteria to bookmark is largely based on a site providing relevant & up-to-date information • Learning about new Web sites: • Search engines • Magazines & newsletters • Other people/colleagues
Survey Highlights • The Web was the 3rd most frequently used source • Participants spent about 20% of their work hours using the Web • Majority looked for technical information on the Web • Quality of Web information was perceived to be “very high” (reliable) • Web was perceived as accessible as other “internal” sources however less accessible than mass media sources • Few participants deliberately set out to search for new sites
Study 1 Summary • Behavioral model of information seeking on the Web • People who use the Web engage in complementary modes of information seeking • Certain browser based actions & events indicate particular moves in information seeking • The study suggests: • that a behavioral framework that relates user motivations and Web moves may be helpful in analyzing Web-based Information Seeking • that multiple, complementary methods of collecting qualitative and quantitative data may help compose a richer portrayal of how individuals use Web-based information in their natural work settings
Iseek Expanded Study (2) • Larger Dataset • One Organization • Longer Duration • Open-ended Interviews • IT Survey • More Quantitative Modeling • Glassman (1994); • Catledge & Pitkow (1995); • Tauscher & Greenberg (1997a, 1997b); • Huberman, Pirolli, Pitkow, & Lukose (1998)
New Types Data Collection • Sources • Modified Logs • Interviews (More Focused) • Survey (Broader Focus) • Field Observation (Cube Work) • Volume • Over 1400 Consistent Users • Over a Month of Web Use • 8+ GB of data
Collecting Web Server Data • Web Server Log Accuracy • Hit - a single file is requested from the Web server • View - all of the information contained on a single Web page • Visit - one series of views at a particular Web site. • Proxy Server Logs • Day sampling - stop caching and analyzing data. • IP sampling - cancel caching of particular Web users and measuring these results only • Continuous sampling - use cookie files to track a particular user(s) • KDD
Survey Highlights • Users not motivated to change/update browser versions or startup page • IT made no modifications of browser until recently, primarily for system access testing • Most of most frequent users from technical departments • All IT system work now Web-specific
Interview Highlights • Corporate adoption of Internet access driven by Intranet development • Local portrayals of successful Web work drove rapid adoption • Use of Intranet viewed as both resource conservation and expanded work • Logging of Web use data not a high concern • Open to recommendations to improve Web use • “Webify”ing Everything seen as good
KDD Highlights • Extremely High Data Collection Reliability • Tightly-focused Web Use (business sites) • Very Small (Determinable) Inappropriate Use ( >.001%) • Lower than Expected Search Engine Use • Influenced by Startup Page • Internal Search Results Pages Used • Higher than Expected (Average) Use of Intranet
KDD Use Highlights • 40,000+ episodes • 11:15 average episode length • Search term mode of 1 • Not dominantly work-related terms • Use of intranet search results influential
Updated Behavioral Model • 32,512 identifiable episodes
Other Studies • Tend to focus on server logs, a broad range of Web users, general Web seeking activity, quantitative methods • Glassman (1994): Proxy Study • Catledge & Pitkow (1995): Surveys and Client tool; • Tauscher & Greenberg (1997a, 1997b): The Back button; • Ingwersen (1995 & 1997): Informetrics • Huberman, Pirolli, Pitkow, & Lukose (1998): Information Foraging, “Law of Surfing” • Huberman “Laws of the Web” (2001)
Study 2 Summary • Behavioral Model Scales Up • Server Logs Provide Significant Gains in Quantity • Server Logs Provide Challenges in Deriving Quality • Organizations Provide Focused View of Overall Web Use • Knowledge Workers Collaborate (But Not Enough)
Summary • (New) Methodology • Provide new ideas for data collection & cleaning tools • Verify models of Information Seeking and Web Use • Discover models of Web usage • Find different types of Web users • Gain rich descriptions of perception of Web & Web use • Evoke new system & interface designs
Other Tools for Web Studies • Pete Pirolli, Rob Reeder, Ed Chi, et. al (UIR Group Xerox PARC) Web Logger • Eytan Adgar, Bernardo Huberman (Web Ecology Group @ PARC, now HP) • Andy Edmonds – Uzilla.net • Vividence • Web Evaluation Tool (WET) • Eye Tracking (*)
Improving Web Use • Expert Systems - SNLP • Multimedia Databases & Metadata • Display Technology • Better GUIs • Better, More Available Search Engines/query Syntax • Desktop Search • Ranking • Relevance • Help expert users get more expert
Web Activities Taxonomies • What types of activities on the Web have impact? • What we do vs. what seems significant • Purpose of people’s search • Find • Get a fact or document • Download information • Find out about a product • Compare/Choose: 51% • Methods used to find information • Explore, Monitor, Find, Collect: 71% • Content for which they are searching • Medical: 18%, People: 13%, …
Berrypicking & IR Flexibility • IR systems are rational, users aren’t (always) • We don’t search in a linear model • Single query, one good result • We gradually build on what we know, how we find it • Footnote chasing (backward chaining) • Citation searching (forward chaining) • Journal run (favorite sites) • Area scanning (browsing) • Subject searches in bibliographies, abstracts & indices • Author searching • We combine all of these when searching • Interface support for each & combinations
Web Search Studies Framework • Web IR is still relatively new • Differences in users & information • Changes in IR systems are rapid • Who doesn’t search now? • “A Web searching study focuses on isolating searching characteristics of searchers using a Web IR system via analysis of data, typically gathered from transaction logs.” p 3 • Studying Search Engine use • AltaVista, Excite • Web Searching Studies • Single & Multiple Web sites
Characterizing Browsing • Modifed XMosiac to learn Web browser behavior • Path lengths key (but changed) • Types of users: • Serendipitous browsers – little repetition, short sequences • General purpose browsers – average, repeated actions • Searchers – long navigational sequences
Cognitive Strategies in Web Search • Systems help with: • re-representation - different external representations, that have the same abstract structure, make problem-solving easier or more difficult. It also refers to how different strategies and representations, varying in their efficiency for solving a problem. • graphical constraining - constrain the kinds of inferences that can be made about the underlying represented concept. • temporal and spatial constraining - different representations make relevant aspects of processes and events more salient when distributed over time and space.
Cognitive Strategies • Searching Conditions • Dispersed or Category Structures • Fact finding • Exploratory searching • Novice & Experiences users • Top-down, bottom-up & mixed
Reading Time, Scrolling & Interaction • Can implicit feedback improve relevancy? • 561 documents, 6 subjects • Read documents & score them • Better than reading, saving & printing? • Measure use now vs. later • Focused on document, not activity • How do you know the user is reading? • Is saving a relevance measure? • No differences noted in scrolling (4.28) • What about following links? • Finding, highlighting, copying?
How do we really use the Web? • People don’t read, they scan Web pages • We move quickly, we know we can go back • Quick experimentation & short memory • Behaviors that work are reinforced & continued • Satificing makes measures of quality difficult • Web pages as Billboards? • What’s billboard information for IR systems?
Revisitation Patterns on WWW • Mostly Re-Visits (58%) • Continually Visit New Pages • Access Only A Few Pages Frequently • Clusters (Sets) & Short Paths of URLs • Frequency • Recency • “Distance” • Types of Navigation • Hub and Spoke • Depth Searching (lots of links before returning, if at all) • Guided Tour (Tasks)
Revisitation Patterns 2 • Back Button Use Affects Everything (Even More Since Study) • Navigation Methods Differ • Reasons for Revisiting • Explore Further • Use Feature (Search or Home Page) • “On the Way” to another Page (IA Problem) • Users Don’t Understand Browser History Very Well or Do They Misunderstand Page/Site Navigation? • Provide Navigation Support • Work with the Back Button – Don’t Break its Functionality
Web search lab game • Break into groups • Answer a set of questions • Different rules for each search • Search as you would • Talk & decide before each move • No typing this time! • Search as you would again • Fast as possible