210 likes | 333 Views
Secondary Evidence for User Satisfaction With Community Information Systems. Gregory B. Newby University of North Carolina at Chapel Hill ASIS Midyear Meeting 1999. What do we want to know?. Who are information seekers ; users? What are their needs? Are their needs being met?
E N D
Secondary Evidence for User Satisfaction With Community Information Systems Gregory B. NewbyUniversity of North Carolina at Chapel Hill ASIS Midyear Meeting 1999
What do we want to know? • Who are information seekers ; users? • What are their needs? • Are their needs being met? • Context: the goals and missions of the community net
What else do we want to know? • Are people viewing sponsorship information? • Reading policy documents? • Displaying images? • Using search engines or indexes? • Local or remote? • Browsing or reading?
Possible sources of evidence • Content analysis: what’s available on the system(s)? Questions asked. • Sociological research: talk to people, look at what they use the net for, etc. • Psychological research: evaluate cognitive change in user knowledge, etc. • Market research: broad data collection from multiple potential audiences
More possible sources of evidence • Secondary data: artifacts generated by information system use • Today’s focus: analysis of log file entries • Web usage statistics • Instrumenting online menu systems • Login or call history • Other system logs (email, FTP)
What questions may be asked of secondary data? • What content is accessed, with what frequency? • What paths are followed to content? • Are entry points, policy documents, or other front-end material bypassed? • Is content read, skimmed or skipped through? • What subsets of content are viewed by individuals (patterns of use)
What’s wrong with Web server logs? • Aggregate level access to content: not the whole story! • What are SESSIONS like (a sequence of accesses by a single person)? • What are paths from item to item (transcends a single “referrer” log) • Are data used linearly (following hyperlinks)? • How long is spent on a document?
More analysis is feasible. Sample: Web server logs • Single line entries for each “hit” (HTTP “GET” or similar request) • Separate file for errors, referrers • Sample entry: • 56kdial52.absi.net - - [22/May/1999:20:12:45 -0500] "GET /index.html HTTP/1.0" 200 6353
Sources of complexity: • Multiple types of servers might be on a single system (e.g., RealServer, database server, search engine) • A Web page visit might involve many files • Frames and other authoring techniques can confuse • More than one person might use the same remote computer
Question: Can we get the “story” of a session? • Yes! Just track through all the “hits” from the same host within a narrow time period • Challenge: how narrow a time period? • Challenge: some hosts support multiple simultaneous users (but not many) • Challenge: lots of files per page might confuse things (but narrow +/- a few second time frames can help) • Challenge: what is structure of site?
Sample “GET” might include multiple files • 203.87.57.76 - - [20/May/1999:18:44:48 -0400] "GET /~gbnewby/inls80/explore2.html HTTP/1.1" 200 9681 • 203.87.57.76 - - [20/May/1999:18:44:50 -0400] "GET /~gbnewby/inls80/octo.gif HTTP/1.1" 200 12053 • 203.87.57.76 - - [20/May/1999:18:44:53 -0400] "GET /~gbnewby/inls80/pmail.gif HTTP/1.1" 200 593
Here’s a “story” (gbn’s pages) • 116.33.237.26 - - [08/May/1999:09:30:59 -0400] "GET /~gbnewby/index_top.html HTTP/1.0" 200 7030116.33.237.26 - - [09/May/1999:00:44:45 -0400] "GET /~gbnewby/index_top.html HTTP/1.0" 200 7030116.33.237.26 - - [09/May/1999:11:43:31 -0400] "GET /gbnewby/forms HTTP/1.0" 301 186116.33.237.26 - - [09/May/1999:12:06:30 -0400] "GET /gbnewby/forms/ HTTP/1.0" 200 1837116.33.237.26 - - [09/May/1999:16:36:06 -0400] "GET /~gbnewby HTTP/1.0" 301 181116.33.237.26 - - [09/May/1999:17:44:47 -0400] "GET /~gbnewby/ HTTP/1.0" 200 1355116.33.237.26 - - [10/May/1999:06:20:22 -0400] "GET /gbnewby/review2.html HTTP/1.0" 200 5178116.33.237.26 - - [10/May/1999:09:33:51 -0400] "GET /gbnewby/vita.html HTTP/1.0" 200 29487116.33.237.26 - - [10/May/1999:13:33:30 -0400] "GET /gbnewby/inls80/explore1.html HTTP/1.0" 200 3977116.33.237.26 - - [11/May/1999:02:43:15 -0400] "GET /gbnewby/inls80/explore2.html HTTP/1.0" 200 9681116.33.237.26 - - [11/May/1999:09:21:56 -0400] "GET /~gbnewby/vita.html HTTP/1.0" 200 29487116.33.237.26 - - [11/May/1999:10:05:31 -0400] "GET /gbnewby/presentations/security.html HTTP/1.0" 200 11270116.33.237.26 - - [11/May/1999:13:35:27 -0400] "GET /gbnewby/index_top.html HTTP/1.0" 200 7030
Question: What are entry points for particular documents? • You’re on easy street with httpd “referrer” logs, but these are often not kept (for efficiency) • Otherwise, you don’t know where someone came from unless it was from YOUR site • By looking through a session “story” you can see the path people take to particular pages. Analyze finding aids!
Here’s a path, including searching and reading • 128.22.40.142 - - [20/May/1999:11:08:34 -0400] "GET /docsouth HTTP/1.0" 301 307 • 128.22.40.142 - - [20/May/1999:11:08:45 -0400] "GET /docsouth/dasmain.html HTTP/1.0" 200 2705 • 128.22.40.142 - - [20/May/1999:11:08:46 -0400] "GET /docsouth/dasnav.html HTTP/1.0" 200 679 • 128.22.40.142 - - [20/May/1999:11:08:46 -0400] "GET /docsouth/images/greensquare.gif HTTP/1.0" 200 55 • 128.22.40.142 - - [20/May/1999:11:08:56 -0400] "GET /docsouth/search.html HTTP/1.0" 200 3778
(part II. This is via metalab.unc.edu) • 128.22.40.142 - - [20/May/1999:11:08:57 -0400] "GET /docsouth/images/greenarrow.gif HTTP/1.0" 200 113 • 128.22.40.142 - - [20/May/1999:11:19:58 -0400] "GET /docsouth/southlit/southlit.html HTTP/1.0" 200 3685 • 128.22.40.142 - - [20/May/1999:11:20:07 -0400] "GET /docsouth/southlit/southlitmain.html HTTP/1.0" 200 2583 • 128.22.40.142 - - [20/May/1999:11:20:07 -0400] "GET /docsouth/southlit/southlitnav.html HTTP/1.0" 200 789
(Part III.) • 128.22.40.142 - - [20/May/1999:11:38:40 -0400] "GET /docsouth/neh/neh.html HTTP/1.0" 200 3539 • 128.22.40.142 - - [20/May/1999:11:38:45 -0400] "GET /docsouth/neh/nehmain.html HTTP/1.0" 200 2743 • 128.22.40.142 - - [20/May/1999:11:38:45 -0400] "GET /docsouth/neh/nehnav.html HTTP/1.0" 200 759 • 128.22.40.142 - - [20/May/1999:11:39:21 -0400] "GET /docsouth/neh/specialneh.html HTTP/1.0" 200 16549 • 128.22.40.142 - - [20/May/1999:11:39:51 -0400] "GET /docsouth/neh/texts.html HTTP/1.0" 200 11999 • 128.22.40.142 - - [20/May/1999:11:40:16 -0400] "GET /docsouth/harriet/menu.html HTTP/1.0" 200 2085 • 128.22.40.142 - - [20/May/1999:11:40:27 -0400] "GET /docsouth/harriet/small.gif HTTP/1.0" 200 43701 • 128.22.40.142 - - [20/May/1999:11:41:01 -0400] "GET /docsouth/harriet/harriet.html HTTP/1.0" 200 217418 • 128.22.40.142 - - [20/May/1999:11:41:07 -0400] "GET /docsouth/harriet/harrietcva.gif HTTP/1.0" 200 85180 • 128.22.40.142 - - [20/May/1999:11:41:11 -0400] "GET /docsouth/harriet/harriettpa.gif HTTP/1.0" 200 77742
Question: Where do people go from a particular location? • Again, your “story” logs can track this • Again, caching is a particular challenge. For example, a user might follow hyperlinks, but the logs show discontinuities (because they went via a cached document)
Sample: going from specifics, to index, to sub-index • 4blah18.blahinc.com - - [22/May/1999:00:21:01 -0500] "GET /mrm/father.html HTTP/1.0" 200 1760 • 4blah18.blahinc.com - - [22/May/1999:00:21:03 -0500] "GET /mrm/bluegrass.gif HTTP/1.0" 200 26959 • 4blah18.blahinc.com - - [22/May/1999:00:27:48 -0500] "GET /index.html HTTP/1.0" 200 6216 • 4blah18.blahinc.com - - [22/May/1999:00:27:51 -0500] "GET /beige_pale.gif HTTP/1.0" 200 2085 • 4blah18.blahinc.com - - [22/May/1999:00:27:53 -0500] "GET /pnetlogo.gif HTTP/1.0" 200 3861 • 4blah18.blahinc.com - - [22/May/1999:00:28:07 -0500] "GET /directory.html HTTP/1.0" 302 216 • 4blah18.blahinc.com - - [22/May/1999:00:28:16 -0500] "GET /directory/culture.html HTTP/1.0" 200 2980 • 4blah18.blahinc.com - - [22/May/1999:00:28:18 -0500] "GET /directory/buggy.jpg HTTP/1.0" 200 8213 • 4blah18.blahinc.com - - [22/May/1999:00:28:38 -0500] "GET /prairienations/index.htm HTTP/1.0" 200 9136 • 4blah18.blahinc.com - - [22/May/1999:00:30:23 -0500] "GET /directory/nature.html HTTP/1.0" 200 6865
Question: How long is spent on a document? • Easy: inter-click time from a session • You could even make an “average time per document” for some gateway documents (such as user agreements). Or, infer AT/D by tracking those sessions that “seem” to be contiguous. This is challenging: what if someone goes to another site, or takes a nap? • Caching is still a problem
Analysis of other secondary sources of data • See Newby & Bishop 1997 for instrumentation of menu systems • Log choices of menu options • Correlate with basic user demographics (collected online) • Problem: most modern systems are not login-based, they’re Web-based • Access logs: are people coming in from dial-up lines, academic locations, etc? Dial-up = watch graphics!
Conclusions • The “easy” automated tools for Web log analysis are insufficient • They could be extended with some programming effort or utilities • “Eyeballing” the logs is still useful • Be cautious about privacy - both your own site’s policy, and the problems of posting some log data