1 / 50

William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

The role of custom transaction log analysis in informing the design and implementation of a locally developed open source metasearch application. William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess University of Illinois at Urbana-Champaign October 4, 2009

Download Presentation

William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The role of custom transaction log analysis in informing the design and implementation of a locally developed open source metasearch application William H. Mischo, Elizabeth M. German, Joshua Bishoff Mary C. Schlembach, David S. Vess University of Illinois at Urbana-Champaign October 4, 2009 LITA National Forum

  2. Background • Easy Search • IMLS / NSDL Grant funded • Search assistance • Transaction Log design •  Transaction Log Analysis  • Methodology • Summary • System changes  • Direct links | Author Redo | Expand/Limit OVERVIEW

  3. BACKGROUND Illinois Library Gateway and Transaction Logs

  4. University of Illinois Library Gateway Gateway Portal introduced in September 2007 Guide users to appropriate information resources Recommender system Integration of resources Help with search strategy formulation and refinement Custom Engineering Library portlet with fielded search approach Powered by metasearch system suite (Easy Search) Metasearch over 70 targets

  5. Easy Search Features Recommender system Transaction logs: 2.3 million user search arguments, 2.5 million clickthroughs. Analysis of search arguments, pattern checking Result displays influenced by search arguments AJAX driven display Links into the native interfaces at the point of completed search NISO MXG support

  6. Research Focus • IMLS and NSDL Grants  • Focus on two things: • Design and develop search assistance techniques • Better refer users to relevant information resources

  7. Research Questions • Will users find recommender approach useful? • Can we characterize user information seeking behaviors? • Can we capture user information seeking behavior well enough to provide quality search assistance? • Can useful refinement and navigation services be introduced within the Gateway? • What will search sessions look like web search sessions or OPAC sessions?

  8. Search Assistance Technologies Based on deep transaction log analysis Goal is to develop interactive Information Retrieval (IIR): contextual suggestions & links Improve search strategy refinement by providing navigational assistance Develop dynamic system to suggest relevant information resources to users

  9. Implementation of Search Assistance Fall semester 2009 first semester with complete set of search assistance features implemented Search assistant features were suggested by a deep transaction log analysis and has led to a more robust transaction log format that lets us better understand user behavior This semester we have been monitoring use of search assistance features Remainder of presentation will report what we have learned.

  10. Functions and Examples Search assistance

  11. Search Assistance Functions Stopword removal Spelling suggestions Direct link prompts for frequently entered terms, pathfinder topics. Partial term matches Pattern matching for author search prompts Suggested limiting to phrase and title word and phrase searches Dark target searches in background Direct links to journal title matches Pattern matching for link to Journal Article Locator (full text article finder) Context sensitive arrangement of results

  12. Example - Author Search Patterns Robert A. Smith Smith, Robert A. Smith r. a. Smith RA Smith, RA R. Alan Smith Robert Smith

  13. DLF 2008 Fall Forum

  14. DLF 2008 Fall Forum

  15. DLF 2008 Fall Forum

  16. DLF 2008 Fall Forum

  17. design Transaction log

  18. Example of an entry from a standard web server log: • 2009-09-24 18:39:03128.174.36.99GET /josh/searchaid3/saresultsug.asp nopval=1&project=native&selection=gen&selection=opac&nopval=1&keyword=abraham+lincoln&Bool=all&interp=yes&OPERATE=Perform+Search80 - 128.174.36.95Mozilla/5.0+(Windows;+U;+Windows+NT+6.0;+en-US;+rv:1.9.1.3)+Gecko/20090824+Firefox/3.5.3+(.NET+CLR+3.5.30729)200 0 0 Transaction Log

  19. Problems • Web Server logs require extensive post-processing to determine the relationships between user actions. • Web Server logs don't reveal when we refer a user to a vendor database. • Our Approach: Deposit the log in a relational database that can reveal the full dimensions of a client interaction with the server, and develop a solution to log client exits from the UIUC gateway to an outside provider.  Must also log search suggestions made by the server-side program.

  20. Our Log • Began as third-party open-source application called Statcountex, written in ASP.  (http://2enetworx.com/dev/projects/statcountex.asp)   • Modified (heavily) to interact with the main Easy Search processing routine. • We also added the session-tracking functionality built into ASP.

  21. Log Events/ Relationships

  22. Example of table searchstats

  23. Tracking User Sessions • Search result page page checks client browser to see if client has been there before; if not, writes new cookie and logs new sessionid. • If the client has a current cookie, the server looks up the client's previous search and enters it in the SearchStats table (under column previoussearch.  This makes post-processing simpler: all new sessions begin on rows where previoussearch is null; all searches where previoussearch is not null are follow-up searches.

  24. Logging User Actions • Each search submitted generates a unique SearchStatID and  row in the SearchStats table. • Fields captured: referer, IP, sessionid, previoussearch, date/time, catid, useragent. • previoussearch comes from cookie; "suggest" field records server provision of any assistive prompts after query processing

  25. Logging User Actions: Clicks • Result page links are dynamic & refer to the primary key of SearchStats Actual href: http://search.grainger.uiuc.edu/searchaidlog2/sourcelog.asp?ID=243989&acse--http://www.library.uiuc.edu/proxy/go.php?url=http://search.ebscohost.com/login.aspx?direct=true&db=aph&bquery=(gaas)&type=1&site=ehost-live

  26. Clicks (continued) URL to results actually has 3 URLs: • Separate logging file writes SearchStatID, name of resource clicked,& time information to separate table Clickstream in log database.  Clickstream has a foreign key of searchstatid.  File redirects user. • User passes through EZproxy; • User arrives at results in vendor interface.

  27. TRANSACTION ANALYSIS AND LOG COMPARISONS Search failures and system improvements        

  28. User Studies Markey’s two papers on End-User Studies - JASIST 2007 32 studies Need for new OPAC studies Library Portal/Gateway studies needed Spink and Jansen findings on Web searches Short search sessions Average search: 2.3 words “Advanced features” not being utilized Users typically look at first page of results only DLF 2008 Fall Forum

  29. 2008 – 2009 Searches • 58.4% Follow-ups • 12% are Author • 2.3% Author redo link (12% clicked) • 0.9% from phrase/title links • 3% show Direct suggests (64% clicked) Search Arguments 10.4% Booleans (12.3% AND, 0.2% OR, 0.1% NOT) 8.1% Commas 0.1% Parentheses 4.2% Quotes 20.5% Prepositions 9.8% Spell Suggests (31.5% are clicked) 0.7% +

  30. Words Number of Searches % 1 33,054 12.3 2 70,719 26.3 3 56,584 21.1 4 44,251 16.5 5 21,730 8.1 6 12,430 4.6 7 7,598 2.8 8 5,212 1.9 > 8 16,786 6.3 Total 268,454 100 2008-2009 Easy Search -- 3.758 Words per Query DLF 2008 Fall Forum

  31. Clickthroughs 2008 v. 2009 Direct Links: 1.8%

  32. Search Assistance Response • Spelling suggestions offered for 12.6% of searches • Spell suggestion clicked 34% of times offered • Direct links offered for 3% of searches • Direct links clicked 64% of times offered • Success • "Dark Targets" offered for 27% of searches • Dark Targets clicked 8% • Reduce matches by Title or Phrase searching: 20% • Reduce prompt clicked 5% of times offered

  33. Search Assistance Responses • "Search as Author" prompt offered for 2.3% of searches • Clicked 12% of times offered • Link to Journal & Article Locator service: 2% of searches • Clicked 2% of times offered • Not very successful 

  34. What We Have Learned Broad continuum of searches being performed– topical, specific item Users expect sophisticated parsing – mental model Spell suggestions important Must accommodate specific item search Author search and fielded search Used as Reference tool Search assistance being utilized

  35. Reacting to Logs with Design • numbers provide a real-time measure of search enhancement success • poor performance can influence changes in prompt language, presentation, triggering event    • Example: Known-item Searching

  36. User Behavior - Known Items A sample of 3,000 log entries from the single-entry box gateway taken in semester 1, 2007, was analyzed in detail. In this sample, fully 49.4% of the searches were “known-item”, “known-person/organization” or specific item searches as opposed to topical searches. These searches were for specific book, journal, or article titles or a specific author name. Of the 49.4% specific item searches: • 7.4% of the 49.4% were author/title; • 28.9% were author; • 40.5% were book/monographic searches; • 6.8% were index/abstract title; • 5.7% were for specific journal article; and • 11.8% were for specific journal title Overall, 17.96% of the searches contained a name or an organization, although clearly some of these are topical search

  37. Reacting to Known-Item Searches Search Assistance introduced: • Search as Title, Author, Phrase: A fielded approach reduces the number of clicks between a user and a Known Item • Exact Journal title match, Exact A & I title match & Direct Link prompts introduced • Assistance methods rooted in observations of logged user behavior

  38. The "Common Query" database • Calculated the most frequently searched terms • Noticed entries like "ebsco", "IEEE" • These "directional" searches were apparently unsuccessful for users •  Developed a database of links to UIUC resources, with user-entered search arguments as its vocabulary • Very successful (users follow direct links 64% of times offered; excellent feedback)

  39. with prototype examples The Future

  40. Future Guided search module – “Help Getting Started” encyclopedias, dissertations, e-books, popular journal articles, etc Tailored (vertical) search modules NSDL STEM Education Site  Library and Information Science  Faceted result displays Return first 10 articles from selected targets to user  Merging of results   Agent approach (software agents that may e.g. return answers rather than citations) Portability University of Illinois Springfield

More Related