1 / 28

Collection-Level User Searches in Federated Digital Resource Environment

Collection-Level User Searches in Federated Digital Resource Environment. Oksana Zavalina IMLS Digital Collections and Content project Graduate School of Library and Information Science University of Illinois at Urbana-Champaign 2007 ASIS&T Annual meeting.

ginger
Download Presentation

Collection-Level User Searches in Federated Digital Resource Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collection-Level User Searches in Federated Digital Resource Environment Oksana Zavalina IMLS Digital Collections and Content project Graduate School of Library and Information Science University of Illinois at Urbana-Champaign 2007 ASIS&T Annual meeting

  2. IMLS Digital Collections and Content project at UIUC • Started in 2002 with National Leadership Grant • Aggregation of over 200 cultural heritage collections • Collection Registry: • Provides access, services, and additional functionality to a database of collection descriptions • Collection-level schema developed based on DC and RSLP (Research Support Libraries Programme, UK) • Metadata repository: • Harvested metadata aggregated in one location • Acts as a portal to the item-level records for digital content in NLG collections

  3. Participating institutions, 2006

  4. Collection Registry: access through search and browse

  5. Subject Representation in the Registry Gateway to Educational Materials (GEM) subject headings Alternative subject headings (e.g., LCSH, AAT, locally-developed) Geographic coverage headings (Getty Thesaurus of Geographic Terms).

  6. Top GEM (Gateway to Educational Materials) subjects in Collection Registry Social Studies 80% United States history State history … Arts 46% Visual arts Photography … Science 17%

  7. Collection records

  8. Collection records

  9. Research questions What is the distribution of the two major search types (subject and known-item) in the Registry? What are the typical user search categories in the Registry? Can FRBR set of 10 entities be used for user search categorization? What are the quantitative characteristics of a typical user search query in the Registry? How suitable is GEM subject scheme for describing diverse collections in the Registry compared to alternative controlled vocabularies? semantic similarity measures user keywords extracted from transaction logs subject terms in 3 different controlled vocabularies — GEM, Library of Congress Subject Headings (LCSH), and Art and Architecture Thesaurus (AAT).

  10. Dataset MS Access file (7 months; 19,000 records) 936 user keyword search query strings Minimal data processing manual extraction of keyword query strings aggregation of repetitive identical queries and morphological variants no query parsing stop-word list: prepositions, conjunctions and articles Methods Transaction log analysis qualitative (subject analysis + similarity measures) quantitative (basic descriptive statistics).

  11. Methods: search categories • FRANAR/FRAD entity • family • Additional categories: • classes of persons (e.g., “abused children”, “prisoners”) [2% of queries] • ethnic/national group (“Irish Americans”, “Sioux Indian”) [5% of queries] • unknown searchcategory (e.g., “beyond”, “LU+65”) • 7 FRBR entities: • work[collection as a work; any intellectual or artistic creation that has a title attribute] • (individual) person • corporate body • concept • object • event • place

  12. Operational definition of search types searches where the user queries either the title or the author — individual or corporate — of the digital collection belong to collection-level known-itemsearch type; all the other searches in the Registry belong to a collection-level subjectsearch type.

  13. Methods: similarity measures exact matches synonymous matches (semantic variants) near-exact matches: syntactic variants (e.g., “French art” and “Art, French”) morphological variants (e.g., “automated speech recognition” and “automatic speech recognition”) acronyms (e.g., “WW1” and “World War, 1914-1918”) NO broader and narrower terms matches

  14. Findings: search categories Object, concept, place, individual person are heavily used surprisingly low level of event searching (4%)

  15. Polysemy and search intent ambiguity problem Concept or Object? “books”, “tools” “Amusement park”, “Ballrooms”, “Highways”, “interstates”, “detroit+historical+museums”; “Industrial models”, “Lesson+plans”, “dissertations”; “Landscape” Work or Person or Object? “don+quijote” “Tom+Sawyer” Event or Concept? “Civil rights movement” “Census” Single or multiple categories/entities? “Letters+from+19th+century” (object? object AND event?) “children+that+are+abused” (class of persons? class of persons AND event?) “henry+fordmuseumand+greenfield+village” (corporate body? corporate body AND person AND place?)

  16. Findings: search types Prevalence of subject search Higher than usual level of subject searching general shift towards subject searching in Web 2.0? conceptual difference between collection-level and item-level search?

  17. Findings: searchquery length

  18. Findings: frequency of unique search query use (query popularity)

  19. Findings: user queries by search category

  20. Findings: semantic similarity

  21. Semantic similarity findings at a glance Weak semantic match between searches and GEM/AAT terms; strong match for LCSH GEM represents only concepts, AAT only concepts and objects

  22. Findings: semantic similarity overlap

  23. Semantic similarityoverlapfindings at a glance LCSH on its own (without any overlap with AAT or GEM) covers 48% of the user search terms. Only 12 terms (7%) matched in AAT were not also matched in LCSH. All the terms matched in GEM were also matched in LCSH. 27% of user search terms were not matched in any of the three controlled vocabularies.

  24. Semantic similarity map

  25. Conclusions Unusually high for catalog use / transaction log analysis studies level of subject searching Strong semantic match to user queries offered by a traditional library subject scheme — Library of Congress Subject Headings Combination of two or more standardized controlled vocabularies may be beneficial for collection-level subject description in IMLS DCC Registry Based on user searches, we recommend to update FRBR model to cover class of persons and ethnic/national group.

  26. Further research reasons for subject search prominence (interviews and observations of the Registry users) user conceptualization of the collection-level search and its possible difference from the concept of the item-level search investigate more flexible than LCSH controlled vocabularies, which, unlike GEM or AAT, represent a wide variety of search categories.

  27. Acknowledgements This research has been funded by IMLS NLG Research and Demonstration grant LG-02-02-0281 http://imlsdcc.grainger.uiuc.edu/ Special thanks to: Timothy W. Cole – Principal Investigator Carole L. Palmer – Co-Principal Investigator Michael Twidale – Co-Principal Investigator Amy Jackson, Sarah Shreeves, and Jenny Benevento – current and former Project Coordinators

  28. Questionsand comments always welcome Oksana Zavalina zavalina@uiuc.edu

More Related